# Environments

## Objectives


1. Understand git and what it can do
2. Recap last week
3. Nested for loops
4. File paths and how to read data into python environment
5. Python packages and why we need them

## First git.

Git was written by the author of Linux as a way of tracking the changes that were being submitted to Linux by other developers. 

At it's core that is all it does, it keeps a record of all the changes that have taken place and the current state of a repository (think of a repo as just a big project folder).

It can be used to roll back when something is broken and it can be used to allow two developers to work on the same project without getting in each other's way.

## Git Basics

There are some concepts basic git concepts that will be handy for this course. 

We are keen that you are all able to keep all your code / notes that you have made in these projects throughout the course, but equally storing everyones notes together is tricky but not for git!

When working on the same repository developers will usually open a "branch" which allows them to go off and break as much as they like without interfering with the main project, when they are done breaking things and want to actually contribute for a change they can ask to merge their branch into the main branch of the project (this is called a pull request). As they are going along and making their changes they will be "committing" their changes, this means creating a checkpoint which can be arrived back at if everything goes wrong (think of this as just saving) but the difference being that it keeps a record of the unique identifier of the save such that anyone can pick it and look at it / use it. 

For this course we are going to ask everyone to create their own branch of the repo in which to keep their notes. You can commit them if you like or just ignore all of this rubbish for when it is necessary. 

You can do this by going to the root of the repo (`cd python_club`) and running the git command `git branch <name>`. In order to pull new lessons from the repo (we are adding them as we go) you can run `git pull` from the same directory and the new lessons should be downloaded. If you get any errors then please ask and we will fix all you (git related) problems.

## What you need to do

- `> git checkout -b <YOUR_NAME>`

- `> git config --global user.name "name"`

- `git config --global user.email "email"`

- `> git add .`

- `> git commit -m 'lesson 1'`

- `> git checkout master`

- `> git pull`

- `> git checkout <YOUR_NAME>`

- `> git rebase master`

In [1]:
# Recap

animal_list = ['dog', 'cat', 'anteater']

for animal in animal_list:
    print(animal)

dog
cat
anteater


For loops iterate over iterables (this is a generic term for any data type that can be iterated over). Circular this definition is. 

A simple rule of thumb is: if it's a list you can for loop over it.

In [2]:
# Exercise

# Calculate the final value of a bank account taking the following variables.

initial_amount = 100
monthly_payments = [10, 20, 10, 20, 50, 20]
interest = 0.15

In [None]:
for payment in monthly_payments:
    initial_amount += 

In [None]:
x = 0
x = x+1
x += 1

In [1]:
counter = 0
for x in range(10):
    counter += 1
    print(counter)

1
2
3
4
5
6
7
8
9
10


What if we wanted to make a table of data?

A 2 dimensional list would work well for this:

In [3]:
# This is a list of lists
table = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
]

This can be iterated over using two for loops one inside the other or nested for loops

In [4]:
result = 0
for row in table:
    for column in row:
        print(column)

[1, 2, 3]
[4, 5, 6]
[7, 8, 9]


In [5]:
# Exercise

# Can you calculate the total sum of each row? Can you write a function to calculate the sum of all the rows in any nested list?

# Bonus points can you calculate total sum of each column? 

In [6]:
# file paths

Python is useful without any external input but also a bit dull. Where the real power comes from is manipulating data that is outside of a python program. For example if we wanted to analyse some data that sat in a CSV file we would first need to load that CSV into a python object (perhaps a list of lists...) in order to analyse the data.

In this directory is a csv file which we will now try to load into python.

First we will try to do this manually with just the standard library.

Python standard library has a function called `open()` which can be called on a file. Historically it was necessary to open a file, use or change its data and then close it again. However in recent versions of Python a new tool has been added called a context manager, this allows a user to maintain an open connection to the file as long as they need it and then it is automatically closed when not needed.

A context manager can be opened using the keyword `with`. 

```python
with open('test.csv') as file:
    # do stuff
```

Once opened, files have a few functions that can be called, for the purposes of this lesson a useful one is `readlines()` which creates an iterator which is a list like object of all of the lines of the file that has been opened. 

The final method that is needed will be how to add things to a python list once it has been instantiated. This can be done very simply with the `.append()` method.

In [7]:
# Append example
patients = []
patients

[]

In [8]:
patients.append('patient_42')

In [9]:
patients

['patient_42']

In [10]:
patient_attendances = {}
patient_attendances

{}

In [11]:
patient_attendances['patient_42'] = 7

In [12]:
patient_attendances

{'patient_42': 7}

In [13]:
# Exercise

# Using the above examples can you read the file test.csv into a nested list in Python?


This is a neat and elegant way of reading data from csv files into python for analysis, however it is a bit manual and time consuming for a simple task. 

Here we find the true power of Python though.

Someone has probably already done it before, and you can just use theirs.

In python there is a wide array of packages that are freely available through a service called the Python Package Index.

You may have wondered why you had to run `pipenv install` before you could get anything to work. That is because you needed some packages just to get this notebook to run.

Packages in python are used everywhere, at the top of most notebooks you will see are a few `import` statements, these are packages that need to be used in that script. 

The way the DIL team do python package and environment management is through a package called `pipenv` this uses the python package manager `pip` and builds virtual environments based of of the contents of the `Pipfile`. 

A virtual environment is a walled off section in which a user can specify the python version and any packages to be installed. Those packages will only be installed in just this environment. In this way a developer can work on multiple projects with different dependencies at the same time.

When you ran `pipenv shell` you opened a shell inside the virtual environment such that any python code you execute from the shell will now have access to all the python packages that have been installed in that environment.

If you want to add a new package to the virtual environment then you just have to run `pipenv install <PACKAGE>` and it will be added.

This is all to say that there is a better way of reading CSVs into python. It is a package called `pandas` and it is one of the most successful python packages around and probably single handedly responsible for Python's domination of scientific computing over the last 5 years.

With pandas all you have to do is `import pandas as pd` and you are able to run all kinds of clever analysis. This will be covered by Jack next week.

In [14]:
import pandas as pd

test_csv = pd.read_csv('test.csv')

In [15]:
test_csv

Unnamed: 0,col1,col2,col3
0,1,2,3
1,4,5,6
2,7,8,9
