# Session 5
## Pandas

Pandas is a package for dealing with tabulated data. It has functions for reading, writing and manipulating data as dataframes. It is a standard tool in Python data science.

In [None]:
import pandas as pd

We load the data and display the first few rows

In [None]:
fn_gapminder = "data/gapminder.csv"
gapminder = pd.read_csv(fn_gapminder)
gapminder.head()

The describe function gives various stats about the numerical columns

In [None]:
gapminder.describe()

**We can also call methods to perform actions on individual columns. For example, missing values or `NaN`s are often an issue, so the first thing I do is check for any in our data.**

In [None]:
gapminder.isna().sum()

**Data is selected with the `.loc[rows, columns]` method, though there are some short cuts, such as `table.colname`**

In [None]:
gapminder.loc[:, 'lifeExp']

In [None]:
gapminder.lifeExp

**More complicated selections are made with boolean masks**

In [None]:
asia_mask = gapminder.continent == 'Asia'
asia_mask

In [None]:
gapminder.loc[asia_mask]

**To get the mean GDP for each continent in 2007 we could use the following loop**

In [None]:
year_mask = gapminder.year == 2007
for continent in gapminder.continent.unique():
    continent_mask = gapminder.continent == continent
    mean_gdp = gapminder.loc[year_mask & continent_mask, 'gdpPercap'].mean()
    print(continent, mean_gdp)

**Or we could use Pandas `.groupby()` dataframe method designed for this exact kind of operation**

In [None]:
groups = gapminder.groupby(['year', 'continent'])
year_continent_gdp = groups.gdpPercap.mean()
year_continent_gdp.loc[2007]

**Pandas is extremely useful, and will be used throughout this course**. 

Tutorials are available here https://pandas.pydata.org/pandas-docs/stable/getting_started/tutorials.html

# Going further

Because of the length of the session, this Python recap could not cover modules or packages that may be useful to you such as Pandas or Numpy

To learn about csv or Pandas you can go over the Data Science course material which can be found here : https://github.com/pycam/python-data-science
This training material is also done using Jupiter. The README file will get you started. 

To learn about Numpy : https://numpy.org/

You may be interested making your own website. A widely used framework is Django : https://www.djangoproject.com/

You also may want to be able to write test scripts, to test your code and make sure it works. The default python comes with all you need to do this : 
https://docs.python.org/3/library/unittest.html

Finally, you may want to get a different editor for your python work. A very good one is called PyCharm : https://www.jetbrains.com/pycharm/. The free community version is already very good. The professional version is of course even better, though depending on your work you may not need all the fancy features that come with the professional edition.  
