# Tutorial

The first section of this tutorial is written to be presented in Google Colab which includes a "batteries included" environment. In this section, no installation will be required. After that, there is an installation section which should contain all necessary instructions to run the same notebook locally on your own device.

## Learning Objectives

By the end of this tutorial you should be able to

* Open a notebook in colab (You are here!).
* Use basic python commands and syntax.
* Use common libraries useful for Artificial Intelligence (AI) and Machine Learning (ML).
* Make a function that opens another file.
* Download contents of a git repository.
* Set up a conda virtual envorinment.

## Python Logic

Logicical operators in python are binary operators which evaluate logical expressions into the binary of `True` and `False`.
This is analagous to computer machine code which uses bits (that is, 0 and 1) to execute instructions and store data. These values in python are called "booleans" after [George Boole](https://en.wikipedia.org/wiki/George_Boole), a mathematician who extensively studied the algebra of logic.


We use logical operators (`==`, `>`, `<`, `in`, `not`) to make boolean expressions that facilitate code branching.

### [Conditionals and control flow](https://docs.python.org/3/tutorial/controlflow.html#more-control-flow-tools)

The `if` keyword is used in python to create code branches.
If an _if_ statement's conditione evaluates to `True`, the interpreter proceeds to execute the code in the [suite](https://docs.python.org/3/reference/compound_stmts.html#grammar-token-python-grammar-suite).
Otherwise, (_if_ does not), the interpreter goes to the next piece of code without executing the suite.

There can be zero or more `elif` parts, and the `else` part is optional. The keyword `elif` is short for `else if`, and is useful to avoid excessive indentation.

### For Loops

A `for` loop is a shortcut, or syntactic sugar, used to process items in an iterable (an ordered container of items).
It is similar to $\Sigma$ summation notation in mathematics, for example,
$\displaystyle \texttt{total} = \sum_{i=0}^{n-1} i$ is analagous to the following python code:
```
for i in range(n):
    total = total + i
```

In [None]:
# Test if a number is even
number = 6
if number % 2 == 0: # modulus operator
    print("This number is even!")
else:
    print("This number is odd")

In [None]:
# Print every number in a list
for number in [6, 28, 496, 8128, 33550336]:
    print(number)

### Knowledge Checkpoint!

Now is the time to check your understanding.

Below is the skeleton for a function (if any term is confusing, please ask and we will explain) that makes a list of numbers `1` to `N`. Write a function that:

1. Checks if the number is odd or even
2. Adds the even numbers to a running total
3. Skips the odd numbers

Execute the below cell to verify you've got the right results!

(hint: If you are struggling, make sure the code is doing what you think it is by using `print` with any variables you are unsure about.)

<details><summary>

### Aside: List comprehensions

</summary>

The venerable for loop is a main stay of computer programming, but in python it holds extra functionality that is not always required. In many applications what is really desired is to take a list of items and create another list with the items transformed or filtered in some way. Python has a specific pattern for this type of transformation called a list comprehension. List comprehensions have benefits over for loops in that they are expressions, as opposed to statements, which means they are faster in many cases. Consider this article from 2004 on [efficient string concatenation][python-string-concat] which compares the speed of using several solutions. The trade off is that you lose control flow abilities such as `break` and `continue`, but most of the use cases for these can also be included in a list comprehension using its `if` clause. Here is an example of a list comprehension computing all the even squares less than 100:

```python
[ x**2
  for x in range(12)
  if (x**2 % 2) == 0
  if x**2 < 100
]
```

This is equivalent to:
```python

resulting_list = []
for x in range(12):
  if (x**2 % 2 == 0) and (x**2<100):
    resulting_list.append(x**2)
  else:
    pass
```


This is an example of a general pattern using list comprehensions:

```python
[
    transform(item)
    for item in items
    if passes_test(item)
]
```

Which behaves like:
```python

resulting_list = []
for item in items:
  if passes_test(item):
    resulting_list.append(transform(item))
  else:
    pass
```


Another benefit of list comprehensions is readability, but that is more difficult to measure objectively.

[python-string-concat]: https://waymoot.org/home/python_string/

</details>

In [None]:
def even_adder(n_digits=20):

    digit_list = [i+1 for i in range(n_digits)]
    even_sum = 0
    for digit in "":
        if '':
            even_sum = ""
        else:
            ''

    return even_sum

In [None]:
assert even_adder(20) == 110
assert even_adder(12) == 42
assert even_adder(4) == 6

### Functions and operations

In python, functions are objects which execute the same code, with possibly different arguments (or, inputs or parameters).

In the same way that a mathematical function such as $f(x,y)$ can give you 5 or 9, depending on what `x` and `y` you give `f` as an input, functions in python operate similarly.

If you ever find yourself repeating code many times in your program then it is a good idea to put it in a function! In particular, if any changes need to be made, they can be made in one place which reduces the possibility of introducing errors.

To make a function, you use the `def` keyword, and specify its signature using `()`, and it may return a value using the `return` keyword. There are some, hopefully helpful, examples below.

Arguments passed to functions can also be given default values, by assigning the default value when you define the function. This means when that function is called, that variable does not need to be included, it will be given that default value. This can enhance readability and reduce development time if used judiciously.

In [None]:
# Basic examples

def do_addition(x, y):
    return x + y

def open_file(file_path):
    with open(file_path, 'r') as f: # a context manager using the `with` keyword
        opened_file = f.read()
    return opened_file

def tell_me_what_im_thinking_of(what_im_thinking_about, polite=False):
    oracles_repsonse = f"You are thinking about {what_im_thinking_about}."
    if polite:
        oracles_repsonse += "Isn't that right, dear?"

    print(oracles_repsonse)

In [None]:
print(do_addition(1, 2))
print(do_addition(6, 7))
print(tell_me_what_im_thinking_of("A nice lunch", polite=True))

### Knowledge Checkpoint!

Below are a few lines of code that have a common operation. Transform that code that into a function!

The next cell has a few statements to make sure you've got the right anwser, try those out to check your work!

In [None]:
total = 0

total = total * 2
total = total + 1
total = total * 2
total = total + 1
total = total / 3
total = total * 2
total = total + 1

In [None]:
def skeleton_function():
    pass

In [None]:
assert skeleton_function(2) == 5
assert skeleton_function(3) == 7

## Using Git

The `git` tool is a distributed version control system which is the _lingua franca_ of open source software. If you want to store code and keep track of its versions, `git` is the tool to use. The `GitHub` platform, and its other (sightly less Microsoft-owned) counterparts such as `GitLab`, are `git` repository hosting platforms, that is they give the code a place to live, which are fundamental to how modern software development is practiced.

For our purposes, we'll be using `GitHub` to organize the code used for the learning sessions. If you are so inclined, you can also `fork` these tutorials and keep a copy for yourself!

## Common packages

The beauty of the python ecosystem is that it has some of the broadest and most complete package support compared to other languages. There are several incredibly powerful packages ready for you to use for AI/ML/Data Science, so we will introduce you to some of the big ones.

### Numpy
Numpy might need no introduction, but for completeness we will do so anyway.
It is THE matrix operation package.
It has lovely data structures, and a powerful C based back end.
If you need to do any numerical mathematics beyond "1+1" then Numpy is the way to go.

* [Numpy Documentation](https://numpy.org/doc/stable/)

In [None]:
# Import the package so we can use it in the code
import numpy as np # The common alias

In [None]:
# Task
# Generate random data and verify that the standard deviation and mean is
# what they are supposed to be

# first: Generate the distirbution
# Reference the documentation to see which parameters the function requires and
# their meaning:
# https://numpy.org/doc/stable/reference/random/generated/numpy.random.normal.html#numpy.random.normal

distribution_size = 200
normal_distribution = np.random.default_rng().normal(0, 0.5, distribution_size)

In [None]:
# Check we've got the right size!
 #`shape` is a property of the distribution object, so we do not call it with ()
normal_distribution.shape

In [None]:
# Now let's check the distribution for the mean and the standard deviation
# (which should be almost the same as our parameters from before)
# std and mean are functions, so they need to be called with ()
normal_distribution.std()

In [None]:
normal_distribution.mean()

## Bonus! - Add noise to that distribution.

When we refer to "noise", you can think of it like actual audio noise.
If you record your voice on a cheap microphone, there will be a lot of extra crackles and sounds besides just for your voice.  
Your voice here is the 'signal', which we care about, and the microphone interference is machine 'noise'.
The same principal can apply to any sort of signal, audio, visual (think about jpg compression!), or numerical.

Here, we'll use two main type of noise, 'normal' or 'gaussian', which just means noise that is in a gaussian distribution, which is the classic Tall Around The Mean Short Around The Edges distribution, and 'uniform' which is roughly even between two points.

Gaussian noise is very commonly used in scientific contexts, because gaussians are very well understood, easy to model, and captures randomness pretty perfectly.

In [None]:
# When we say "noise", we mean some other distribution that can change the shape
# of the distribution.
# Let's try with a uniform distribution - from here:
# https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.uniform.html#numpy.random.Generator.uniform

noise_shape = "?"
noise_lower_bound = "?"
noise_upper_bound = "?"

noise = np.random.default_rng().uniform(
    noise_lower_bound, noise_upper_bound, noise_shape
)

# To apply that to the original distribution, we need to do some array manitpulation
# Looky here! https://numpy.org/doc/stable/reference/arrays.ndarray.html#arithmetic-matrix-multiplication-and-comparison-operations
noisy_normal_distribution = "?"

In [None]:
# Check by making sure the std and mean are around the same as the original

# The difference should be less than .1 sigma - The original distribution has a uniform distirbution applied, so it just changes the levels slightly.
# Think about what would happen if you dusted sand evenly over a sand castle. If you do it perfectly evenly, it doesn't change the spread of the castle really dramatically.
assert abs(noisy_normal_distribution.std() - normal_distribution.std()) < .1

### Pandas
Pandas is another popular data analysis package.  It is great for data views, slicing and taking control over your input data. It has nice, easy-to-play-with tables, so that you can easily describe your data and table subsets based on row contents.
It also uses a C backend in some spots, so it's pretty dang quick where it counts.

- [Pandas Documentation](https://pandas.pydata.org/pandas-docs/stable/)

In [None]:
import pandas as pd # common alais!

In [None]:
# Task
# Make a data frame full of new (random) data and run some statistics

# Pandas can be constructed from a lot of different objects.
# for this tutorial, we'll make a dictionary with numpy arrays in it
# And transform that into an dataframe we can do operations on!

dataframe_length = 20
intial_values = {
    "uniform_distribution": np.random.default_rng().uniform(
        -1, 1, size=dataframe_length
    ), # The default low and high for random.uniform is 0, 1
    # A non-random distribution of -1 to 1. The method linspace produces an array
    # with evenly spaced values
    "linear_space": np.linspace(-1, 1, dataframe_length),
    # The default guassian has a mean of 0 and a standard deviation of 1.
    "guassian_noise": np.random.default_rng().normal(size=dataframe_length)
}

# Transform the dictionary into an array
spaces_dataframe = pd.DataFrame(intial_values)


In [None]:
# Let's now run some stats on this!

# to access a value in a dataframe, we use the column name as an indexer

# This shows us all the unique values which is good for categorical variables.
spaces_dataframe["linear_space"].unique()

In [None]:
# Can also look at the full dataframe with the `describe` method to see a lot of
# stats for all the columns

spaces_dataframe.describe()

In [None]:
## We can also take a subset of the dataframe based on a specific value

# Let's evaluate only the negative values in the linear_space column:
spaces_dataframe[spaces_dataframe['linear_space']<0].describe()

In [None]:
# Also, let's add a new column to the dataframe that is a combination
# of existing data from our table

spaces_dataframe['noisy_linspace'] = (
    spaces_dataframe['linear_space'] + spaces_dataframe['guassian_noise']
)

spaces_dataframe['noisy_linspace'].describe()

In [None]:
# If we decided some data isn't super important, we can just get rid of it

spaces_dataframe.drop(['noisy_linspace'], inplace=True, axis=1)
# inplace=True -> Replace the version of spaces_dataframe with "noisy_linspace" with this new version that doesn't have it.
# axis=1 -> Do this for the columns (the y (1) axis), not the rows (axis=0)

In [None]:
# Task 2 - Save and re-load that data

# It is also important to know how to save existing data, so that you can use it
# later, or if you do not want to re-generate it each time.
# Let's drop this dataframe we just made into a csv (comma seperated values) and
# load it up again to make sure it is the same as it was before:

save_location = "./example_dataframe.csv" # needs the extension name
spaces_dataframe.to_csv(save_location, index=False)
# setting index=False gets rid of the pandas assigned index (generally saved as
# 'Unnamed: 0') that is not a part of our data and we do no not want to save.

In [None]:
# And load it back in to make sure our values are the same!
matrix_reloaded = pd.read_csv(save_location)

# Look through all the columns make sure they're the same columns:
assert matrix_reloaded.columns.all() == spaces_dataframe.columns.all()
# and look at all the values in those columns
for column in spaces_dataframe.columns:
    #.values changes the pandas.Series object into a numpy array
    # which makes it easier to compare:
    assert matrix_reloaded[column].values.all() == spaces_dataframe[column].values.all()

In [None]:
# Your challenge, if you choose to accept it:

# Make a new dataframe
# Show the standard deviation and mean of a random subset of it:

new_dataframe = pd.DataFrame()

new_dataframe["?"] = "?" # construsting the df by adding columns one by one
new_dataframe["?_2"] = "?"

selected_index = '?'

subset_dataframe = new_dataframe[new_dataframe.index in "?" ]

In [None]:
subset_dataframe.describe()

### scikit-learn

The scikit-learn package has some of the best off-the-shelf ML algorithms. It is very easy to use and full of useful diagnostics that help both beginers and seasoned practioners.

Largely, it is split into classification (which is which) and regression (how much is that) tasks. If there is an ML algorithm you want to use, scikit-learn is the first place to look.

- [scikit-learn Documentation](https://scikit-learn.org/stable/)


For this example, we'll use the [Iris Classification dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html#sklearn.datasets.load_iris), and try to write an algorithm that splits them into their defined classes.
First, we'll look at the classes, and then use a [Decision Tree](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html) to try to split them up. The decision tree, briefly, just tries to figure out the best way to split the input data into different classes. You can think of it like a ant crawling up a tree deciding what branch is best to go down.

In [None]:
# Let's import everything we will need here
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier

from sklearn.metrics import accuracy_score

In [None]:
iris_features, iris_labels = load_iris(return_X_y=True, as_frame=True)
# This is automatically split in labels (thing we want to get out of the model),
# and features (thing the model uses to learn)

# Using `as_frame = True` returns a Pandas dataframe

iris_features.head()

In [None]:
# Let's look at our possible classes:
iris_labels.unique() # Just numbers huh?

# If you look at the documentation, you can see this corresponds to:
# 'setosa', 'versicolor', and 'virginica'.

In [None]:
# Before we start training, let's make sure the model won't know the anwsers to
# the questions we are asking it to see how well it did.
# We will reserve a test set to make sure the model can't cheat:

feature_train, feature_test, label_train, label_test =  train_test_split(
    iris_features, iris_labels, test_size=.25
) # Take 75% of the data for training

In [None]:
# Now we get to use the classifier

# trees are pretty simple so they're VERY fast to train
tree = DecisionTreeClassifier().fit(feature_train, label_train)

# Let's see how good it is
# `Predict` just uses the trained tree to guess the class
tree_predictions = tree.predict(feature_test)
# accuracy score has a simple definition here - the number the algorithm got right over the total number of predictions it made
accuracy_score(label_test, tree_predictions)

Now, let's try a regression task.
For this, we will ask you to load up the `diabetes` dataset from sklearn, and pick a regression algorithm to run on it.
You will need to do a little data preprocessing before hand.

Scikit splits its algorithms into classification and regression pretty cleanly, so you can select a different regression method if you want, but the most standard regressor is the [Linear Regressor](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression)

In [None]:
# import your packages
from sklearn.datasets import "?" # Find the dataset

from sklearn."?" import "?" # Pick your algorithm

from sklearn.metrics import r2_score # evaluation metric

In [None]:
features, labels = "?"

features_train, features_test, labels_train, labels_test = "?"
model = "?".fit("?")
prediction = model.predict("?")

r2_score("?", prediction)

### Matplotlib

Now that we have a ton of packages to handle data, the little animal brain we have needs something to look at.
Matplotlib is the best shot at that. I is the most widely used plotting and visualizations package out there. This little introduction will only scratch the surface as to what it can do.

[Matplotlib Documentation](https://matplotlib.org/)

In [None]:
import matplotlib.pyplot as plt # pyplot is the module we'll be using here

In [None]:
# Two most common kind of plots you will use are scatter and line plots.
# We will use a sine distribution to show the different between the two!

sin_x = np.linspace(-3*np.pi, 3*np.pi, 200) # 200 steps between -3pi and 3pi
sin_y = np.sin(sin_x) # The y values! (This is basically writing Y = sin(x) for a whole lot of x's and putting them together in a matrix)

plt.plot(sin_x, sin_y)
plt.title("Sine")
plt.ylabel("Sine(x)")
plt.xlabel("x")
plt.show()

In [None]:
# Plot TWO distributions. On the same plot!

# We can do something cool here, where we can see the noise that we have been
# adding to distributions.
# The upper and lower bounds determine how much noise there is :
sin_noise = np.random.default_rng().uniform(-.3, .3, size=200)

noisy_sin_y = sin_y + sin_noise

# The label helps when making a legend:
plt.scatter(sin_x, noisy_sin_y, label="Noisy", color='black')

# If we don't Show the plot, we can add more elements like how we added axis
# labels and titles before.

# Do not add the color to everything, if you add multiple things on the same
# plot the colors will be different by default
plt.plot(sin_x, sin_y, label='Clean')


plt.legend() # Adds the labels onto the plot

plt.title("Sine")
plt.ylabel("Sine(x)")
plt.xlabel("x")
plt.show()

In [None]:
# Plot diabetes data - See if you can see a relationship between each variable
# and the label

diabetes_data_columns = "?"

for column in diabetes_data_columns:
    x = ""
    y = ""

    assert len(x) == len(y) # The data has to have the same dimensions to work

    plt.scatter(x, y)
    plt.title("?")
    plt.ylabel("?")
    plt.xlabel("?") # All good scientific plots have labels
    plt.show()

## Open an outside datasource


### Download data


Let's grab some data from the [UCI ML Repository](https://archive.ics.uci.edu/)!
It's a fantastic resource for toy datasets to test your skills on.

For this example. Lets use the [BEANS dataset][BEANS].

[BEANS]: https://archive.ics.uci.edu/dataset/602/dry+bean+dataset

To get the link to the zip, right click the "download" button and copy the link.

We will use `curl` to download the data to a specific path.

- [`curl` Documentation](https://curl.se/docs/manual.html)

In [None]:
! curl https://archive.ics.uci.edu/static/public/602/dry+bean+dataset.zip --output "DryBeanDataset.zip"

### Write something that opens the data!

This stuff is in a zip, which makes it pretty hard to read in.
Let's unzip it with another command line tool, `unzip`.

Then, we will have a file to work with.
The file with data is actually an excel file, but we can work with that as well since Pandas has a
[`read_excel`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html) method.

In [None]:
! unzip "DryBeanDataset.zip" -d "./DryBeanDataset/"

In [None]:
beans = pd.read_excel(
    "./DryBeanDataset/DryBeanDataset/Dry_Bean_Dataset.xlsx", engine='openpyxl'
)
# use .head() to see the first 5 row
beans.head()

## Knowledge Check

Then, see if you can see where the sun it at a certain point in time using the [astropy.get_sun()](https://docs.astropy.org/en/stable/api/astropy.coordinates.get_sun.html#astropy.coordinates.get_sun) and the time as defined by astropy.Time().


In [None]:
# install the package

In [None]:
# import the package

In [None]:
# check the version

In [None]:
# Find the sun's position
# given this time object:

from astropy.time import Time
# For fun, you can also change this to be today's date.
current_time = Time('2023-01-01', format='iso')
current_time

In [None]:
# Find the position of sun!
"?".get_sun("?")

## Challenge

Use scikit learn to train a classifier to seperate out a UCI dataset into its classes.
Visualize the data beforehand!

In [None]:
import "?"

In [None]:
! curl "data url" --output "save location"
! unzip "save location" "unzipped location"

In [None]:
# Load the data
loaded_data = pd.read_?("?")
features = "?"
labels = "?"

In [None]:
# Plot! I've given you a histogram but do whatever you think shows the data best

plt.hist("?") # Histogram of the data, documentation is here:
# https://matplotlib.org/stable/gallery/statistics/hist.html#histograms
plt.title("?")
plt.ylabel("Frequency")
plt.xlabel("?")
plt.show()

In [None]:
# preprocessing
features_train, features_test, labels_train, labels_test = "?"

In [None]:
# Fitting and prediction
model = "?".fit("?")
prediction = model.predict("?")

In [None]:
# Evaluate how good your predictions are!

score = "?"(labels_test, prediction)

# Use a plot to visualize those scores:
plt.hist(score)
plt.ylabel("Frequency")
plt.xlabel("?")
plt.title("?")

plt.show()

In [None]:
# You can also plot these predictions, let's do it now
plt.scatter("?", label_test, label='True')
plt.scatter("?", prediction, label='Predict')

plt.xlabel("?")
plt.ylabel("Y")
plt.title("?")

plt."?"

plt.show()

## Installation

### `python`

Right now you may be thinking, "Don't I already have python installed?". Probably! However it is best to avoid using the system python for a given operating system. This is primarily because operating systems often have critical tools that rely on the system installation being *just so* and it is better to avoid tampering with it in a way that will cause problems. Also it's likely that the version that ships with your system is not python 3 or a number of other caveats. If you want to use your system python to continue, do so at your own risk. This tutorial will install a fresh python and proceed from there.

<details><summary>

#### Aside: system and language level dependencies

</summary>

Note that some python packages are really wrappers of lower level tools. For instance the `psycopg2` package provides a wrapper for the Postgres database engine. In order to use it you will need a Postgres database running somewhere. It is useful to distinguish these kinds of dependencies, that is, the database engine and the python package. This tutorial will refer to Postgres and similar dependencies as **system level dependencies** and python packages as **language level dependencies**. Installing system level dependencies will differ based on the operating system in use, but this tutorial will endeavor to make sure that, once they are in place, the instructions for the language level dependencies will be the same on each platform. In our case, python is a system level dependency, so we will need to install it.

</details>

### Installing `python3.11`

Find your operating system below and follow the instructions to install a recent version of python 3. After completing these steps you should be able to open a shell session and run `python3.11 --version` to see the version of python that you installed. If this is not working, try starting again from scratch. Failing that, seek assistance. If your operating system is not listed then this tutorial assumes you either already know how to do this or how to find out how to do it. Good luck!

#### Mac OS

Download a [python3.11 for Mac OS][python-macos] installer from python.org. Once `python3.11` is installed, open a shell prompt and run `python3.11 --version` to confirm that it is installed correctly.

[python-macos]: https://www.python.org/downloads/macos/

#### Ubuntu

In the latest release you should be able to run the following command with no issues (although you may need to run them with `sudo`):

```sh
apt-get update
apt-get --yes install python3.11 python3.11-dev python3-venv
```

If you are on an older release and see an error like `E: Unable to locate package python3.11` then you may need the assistance of the [deadsnakes PPA][deadsnakes-ppa] project.

[deadsnakes-ppa]: https://launchpad.net/~deadsnakes/+archive/ubuntu/ppa

#### Windows

While it is by no means impossible to install python natively within windows and run everything these tutorials will cover locally, there are so many differences in this case and the others that we will punt and suggest that you [enable WSL2 and install Ubuntu][wsl2-ubuntu] and then follow the above Ubuntu instructions.

[wsl2-ubuntu]: https://docs.microsoft.com/en-us/windows/wsl/install#install

### WARNING: Beyond this point, root will not be required.

If you are used to running a command, watching it fail and then running it again with `sudo` in front of it then you should start breaking that habit now. Running `sudo` in improper situations can result in harder to fix problems.

### Python packages and `pip`

<details>
<summary>
<br />

What we did in colab requires python modules that are not in the python standard library.  Python packages provide modules that satisfy these sorts of language level dependencies. Installing packages is done with `pip`, which is another command that must be installed at the system level. If you followed the above instructions then you should already have `pip`. Even so, we will call `pip` with the invocation `python3.11 -m pip` as this avoids many common environment related problems.

</summary>

<a href="https://xkcd.com/1987/"><img src="https://imgs.xkcd.com/comics/python_environment.png" title="The Python environmental protection agency wants to seal it in a cement chamber, with pictorial messages to future civilizations warning them about the danger of using sudo to install random Python packages."></a>
</details>

<details><summary>

#### Aside: How are Anaconda and `conda` different from these steps?

</summary>

An important point to know is that Anaconda manages both system level and language level dependencies. Anaconda manages any compilation and dependency gathering steps for you, where the approach outlined in this tutorial so far is a more manual one. This trade-off has benefits, particularly when developing your own python packages.

You may have used Anaconda before, and you may already have it installed with a version of python 3. If so, great! You probably do not need to install a new version of python. Instead of making a new virtual environment, you may prefer to make a new conda environment.

See below for instructions on using `conda` instead.
</details>

***

## Using git

### Download the tutorials

Run the below cell to download the tutorials.

Adding a `!` in front of a line in a cell in a jupyter notebook runs that line as if it is a `bash` command.

In [None]:
! git clone https://github.com/BNL-Fermi-Summer-School-2023/tutorials

### set up your local work as a git repository



In [None]:
! git init

### Store your work by commiting



In [None]:
! git add introduction.ipynb
! git commit -m "YOUR MESSAGE HERE!"

## Using python virtual environments

### Set up a virtual envoriment

## Using conda

### Set up a conda env


In [None]:
! conda env [YOUR ENV NAME HERE]

### Add a new package!

In [None]:
! conda activate [THAT SAME ENV NAME]
! conda add numpy

In [None]:
! conda install -y openpyxl # You need to add openpyxl to your envoriment to use
# read_excel. The -y tag makes sure you don't need to confirm any installs

# After you install something, and it doesn't show up, you need to restart your kernel