## Let's Revisit (again)
Numpy and Pandas do a lot of cool things, so with Pandas we're going to revisit the `diabetes.csv` file that includes both numeric and non-numeric values to see what else we can do with it. But first things first, we have to import the pandas package and read in the CSV file to a DataFrame.

In [1]:
import pandas as pd    # abbreviations help when you use the package a lot

In [2]:
# Use the read_csv() function to read in the csv.
# Specify the index column using the index_col argument so that a unique identifer is assigned to each row in the
# DataFrame. You don't need to have this argument, pandas will create an index by default, but we have an column
# where every value is unique (id) so let's use it!
diabetes_df = pd.read_csv("diabetes.csv", index_col="id")

In [4]:
# "Preview" the dataframe using head() or tail()
diabetes_df.head()

Unnamed: 0_level_0,chol,stab.glu,hdl,ratio,glyhb,location,age,gender,height,weight,frame,bp.1s,bp.1d,waist,hip
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
41506,296.0,369,46.0,6.4,16.110001,Louisa,53,male,69.0,173.0,medium,138.0,94.0,35.0,39.0
41507,284.0,89,54.0,5.3,4.39,Louisa,51,female,63.0,154.0,medium,140.0,100.0,32.0,43.0
41510,194.0,269,38.0,5.1,13.63,Louisa,29,female,69.0,167.0,small,120.0,70.0,33.0,40.0
41752,199.0,76,52.0,3.8,4.49,Louisa,41,female,63.0,197.0,medium,120.0,78.0,41.0,48.0
41756,159.0,88,79.0,2.0,,Louisa,68,female,64.0,220.0,medium,100.0,72.0,49.0,58.0


We can do so many things now that we have our data displayed to us in this neat format. What should we do first? Well it seems that data was collected at multiple locations, let's use `value_counts()` to count how many data samples were taken at each location.

In [8]:
# Remember to specify which column of the dataframe you want the value counts for.
diabetes_df['location'].value_counts()

Louisa        203
Buckingham    200
Name: location, dtype: int64

In [9]:
# How many patients of each gender where there?
diabetes_df['gender'].value_counts()

female    234
male      169
Name: gender, dtype: int64

There's a several good cheatsheets out there, here's a couple:
- [DataQuest](#https://www.dataquest.io/blog/large_files/pandas-cheat-sheet.pdf)
- [PyData](#https://github.com/pandas-dev/pandas/blob/master/doc/cheatsheet/Pandas_Cheat_Sheet.pdf)
- [DataCamp](#https://s3.amazonaws.com/assets.datacamp.com/blog_assets/PandasPythonForDataScience.pdf)

Check them out and pick a couple math functions to try with the data. You can perform operations on every column at once, select columns, select rows, basically whatever you want! Where it gets tricky is how to slice the dataframe and feed it into the functions to get the results that you want.

In [13]:
# Let's do an average of the heights of patients 10-20.
# First, we need to use iloc[] to specify the rows that we want. Then we need to specify what column(s) we want.
# Finally we can do the math.
diabetes_df.iloc[10:21]['height'].mean()

64.0

In [14]:
# What happens if we remove the column specification?
diabetes_df.iloc[10:21].mean()

chol        213.000000
stab.glu     99.727273
hdl          43.272727
ratio         5.136364
glyhb         5.340909
age          42.909091
height       64.000000
weight      169.818182
bp.1s       125.800000
bp.1d        83.800000
waist        37.454545
hip          42.545455
dtype: float64

Your turn! Do some exploring with various pandas functions, whether it be math, slicing, dropping columns/rows (careful!) or something else. Get stuck or need help? Just ask! :)