### The Anatomy Of A Dataframe
![Dataframe Anatomy](../images/dataframe-anatomy.png)

### pandas is a python library for working with dataframes
- get familiar with using the [API reference](https://pandas.pydata.org/pandas-docs/stable/reference/index.html#api), which gives information about the many objects, functions and methods for working with dataframes and series



In [8]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

### Let's explore these pandas methods, attributes, and accessors
 - read_csv( )
 - .shape
 - .head( )
 - .tail( )
 - .columns
 - .drop( )
 - .rename( )
 - .loc[]
 - .isin( )
 - .iloc[ ]
 - [[ ]]

### Read in the schools data, a CSV file, and examine the shape, head, and tail

In [11]:
schools = pd.read_csv('../data/schools_clean.csv')
schools.head(2)

Unnamed: 0,level,name,zipcode,grade_k,grade_1,grade_2,grade_3,grade_4,grade_5,grade_6,...,hisp,p_islander,white,male,female,econ_disadv,disabled,limited_eng,lat,lng
0,Elementary School,A. Z. Kelley Elementary,37013,153.0,145.0,149.0,180.0,184.0,,,...,206,1.0,212.0,431,421,261,75.0,298.0,36.021817,-86.658848
1,Elementary School,Alex Green Elementary,37189,42.0,50.0,44.0,38.0,24.0,,,...,29,1.0,21.0,115,119,153,21.0,25.0,36.252961,-86.832229


In [12]:
schools.tail(2)

Unnamed: 0,level,name,zipcode,grade_k,grade_1,grade_2,grade_3,grade_4,grade_5,grade_6,...,hisp,p_islander,white,male,female,econ_disadv,disabled,limited_eng,lat,lng
165,Middle School,William Henry Oliver Middle,37211,,,,,,231.0,271.0,...,158,3.0,437.0,487,498,252,112.0,231.0,36.020174,-86.712207
166,Middle School,Wright Middle,37211,,,,,,188.0,216.0,...,534,1.0,104.0,443,367,400,75.0,536.0,36.100109,-86.734133


In [16]:
schools.shape

(167, 29)

#### the `columns` attribute shows the column names for the DataFrame

In [19]:
schools.columns

Index(['level', 'name', 'zipcode', 'grade_k', 'grade_1', 'grade_2', 'grade_3',
       'grade_4', 'grade_5', 'grade_6', 'grade_7', 'grade_8', 'grade_9',
       'grade_10', 'grade_11', 'grade_12', 'native_amer', 'asian', 'black',
       'hisp', 'p_islander', 'white', 'male', 'female', 'econ_disadv',
       'disabled', 'limited_eng', 'lat', 'lng'],
      dtype='object')

#### The `iloc[ ]` accessor gets the specified rows and columns by their _index_ values

In [22]:
first_five = schools.iloc[0:5, 0:2]

In [24]:
first_five

Unnamed: 0,level,name
0,Elementary School,A. Z. Kelley Elementary
1,Elementary School,Alex Green Elementary
2,Elementary School,Amqui Elementary
3,Elementary School,Andrew Jackson Elementary
4,High School,Antioch High School


#### The `loc[ ]` accessor gets the specified rows and columns by their _names_

In [26]:
middle_schools = schools.loc[schools['level'] == 'Middle School'].head()
middle_schools.shape

(5, 29)

In [None]:
middle_schools.head()

In [None]:
econ_disadv_over_200 = schools.loc[schools.econ_disadv > 200]
econ_disadv_over_200.shape

In [None]:
econ_disadv_over_200.head(3)

#### Use the isin method to reference an external data structure

In [6]:
my_list = [37201, 37203]
downtown_schools = schools.loc[schools.zipcode.isin(my_list)]
downtown_schools.shape

NameError: name 'schools' is not defined

In [None]:
downtown_schools.head()

In [None]:
school_and_type = schools[['name', 'level']]
school_and_type.head(2)

#### Drop columns from a DataFrame with the `.drop( )` method; be sure to specify `columns = ` and pass a list of columns to the method

In [None]:
schools.columns

In [None]:
school_and_gender_counts = schools.drop(columns = ['native_amer', 'asian', 'black', 'hisp', 'p_islander', 'white', 
                                              'econ_disadv', 'disabled', 'limited_eng', 'lat', 'lng'])

In [None]:
school_and_gender_counts.columns

#### If the column list is short and you are feeling lazy, you can assign new column names (as a list _with every column in the right order_ ) to the columns attribute

In [None]:
school_and_type.columns

In [None]:
school_and_type.columns = ['school', 'type']
school_and_type.head()

#### If you only want to change the name of a subset of columns, use the df.rename() function

In [None]:
school_and_gender_counts = school_and_gender_counts.rename(columns = {'level': 'type', 'name': 'school'})
school_and_gender_counts.head()

# End of Instruction

### Starting with the schools dataframe filter out just the High Schools and create a new datafrom called "high_schools".

In [None]:
high_schools = schools.loc[schools['level'] == 'High School']
high_schools.head()

### Now drop the columns named grade_k trhough grade_8 and overwrite you high_schools dataframe with the results

In [None]:
high_schools.columns
high_schools = high_schools.drop(columns = ['grade_k', 'grade_1', 'grade_2', 'grade_3',
       'grade_4', 'grade_5', 'grade_6', 'grade_7', 'grade_8'])


### Create a list name my_zip_codes which contains the zipcodes 37203 and 37013.  Next, use the isin() method to filter the datafrome and save it to itself.

In [None]:
my_zip_codes = [37203,37013] 
my_zip_codes = schools.loc[schools.zipcode.isin(my_zip_codes)]
my_zip_codes.head()

### Rename the level column "high_school"

In [None]:
my_zip_codes = my_zip_codes.rename(columns = {'level' : 'high_school'})
my_zip_codes.head()

### Use the shape attribute to see how many rows and columns exist in the current dataframe.

In [None]:
my_zip_codes.shape
