![](https://snag.gy/h9Xwf1.jpg)

### Importing `pandas`

In [None]:
import pandas as pd
import numpy as np

#### Loading a csv into a DataFrame

In [None]:
drug = pd.read_csv('https://github.com/JamesByers/Datasets/raw/master/drug-use-by-age.csv')

#### Exploring data using DataFrames

In [None]:
drug.head()

In [None]:
drug.tail()

#### Data dimensions

In [None]:
drug.shape

In [None]:
drug.columns

In [None]:
drug['crack-use'].head(3)

In [None]:
drug[['crack-use']].head()

In [None]:
drug.age.head()# Rememeber this will be a Series, not a Data Frame

In [None]:
drug[['age','crack-use']].head()

#### DataFrame vs. Series

<b>Putting a column name in Single square brackets always returns a Numpy Series</b>

In [None]:
type(drug['age'])

<b>Putting a column name in Double square brackets makes it a Data Frame</b>

In [None]:
type(drug[['crack-use']])

#### Examining your data with `.info()`

In [None]:
drug.info()

#### Summarizing data with `.describe()`

In [None]:
drug['crack-use'].describe()

In [None]:
drug[['crack-use','alcohol-frequency']].describe()

In [None]:
drug.describe()

Or get one stat at a time. 

In [None]:
drug.mean()

**Independent Practice on Diamonds Dataset**

In [None]:
# load diamonds dataset
dia = pd.read_csv('https://github.com/JamesByers/Datasets/raw/master/diamonds.csv')

In [None]:
# column names


In [None]:
# shape


In [None]:
# column data types, number of entries and memory used. seems like alot of info


In [None]:
# summary stats 

**Pandas Indexing**

In [None]:
new_index_values = ['A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q']
drug.index = new_index_values

In [None]:
drug.head()

In [None]:
dia.reset_index(drop=True, inplace=True)

In [None]:
dia.head()

In [None]:
subset = drug.loc[['B','C','D','E','F'], ['marijuana-use','marijuana-frequency']]

In [None]:
subset 

In [None]:
drug.iloc[[1,2,3,4,5],[4,5]]

In [None]:
drug.ix[[2,3,4,'F','G'], ['marijuana-use','marijuana-frequency']]

In [None]:
drug.ix[['F','E','D'],['marijuana-use','marijuana-frequency']]

**Using a column to create an index**

In [None]:
drug.index=drug['age']
drug.head()

In [None]:
# very specific indexing
drug.ix['26-29',[4,5]]

**Be careful with setting indices. Make sure your index is Numeric. Make sure it is Unique. Otherwise, Undoing it can be messy.**

**In the future dont set 'age' as an index since it is neither numeric, nor is it unique.  Let's reset our index.**

In [None]:
# reset index
drug.reset_index(drop=True, inplace=True)
# 'drop=False' will make the previous index into a new column

drug.head()

### Creating DataFrames  
The _best way_ to create a DataFrame is to use a Dictionary, but remember to keep the list the values the **same length**.

In [None]:
mydata = pd.DataFrame({'Letters':['A','B','C'], 'Integers':[1,2,3], 'Floats':[2.2, 3.3, 4.4]})
mydata

#### Examining data types

In [None]:
mydata.dtypes

#### Renaming and Assignment

In [None]:
# changing specific column name
mydata.rename(columns={'Integers':'Ints'},inplace=True)
mydata

In [None]:
mydata.dtypes, mydata.columns

In [None]:
# renaming all columns
mydata.columns=['A','B','C']
mydata

In [None]:
# assigning specific values using indexing.
mydata.ix[1,'B'] = 100
mydata

In [None]:
mydata.ix[:,'A'] = 'foo' # index range all, column 'A'
mydata

In [None]:
mydata.ix[0,['B','C']]  = [1000,1] # index 0, columns 'B' & 'C'
mydata

### Basic Ploting with DataFrames

In [None]:
import matplotlib.pyplot as plt

# render plots inline
%matplotlib inline

In [None]:
drug.plot(x='age',y='crack-use')

In [None]:
drug.hist('crack-use')

### Filtering Logic

In [None]:
drug[drug['marijuana-use'] > 20]

In [None]:
drug[drug['marijuana-use']>25][['age','marijuana-frequency']]

In [None]:
drug[(drug['marijuana-use']>20) & (drug['n'] > 4000)]