#### Notes on 
# DataFrame Data Structure

The DataFrame is conceptually a two-dimensional series object, where there's an index and multiple columns of 
content, with each column having a label. 

In [1]:
import pandas as pd

In [2]:
# Three school records for students and their class grades.
record1 = pd.Series({'Name': 'Alice',
                        'Class': 'Physics',
                        'Score': 85})
record2 = pd.Series({'Name': 'Jack',
                        'Class': 'Chemistry',
                        'Score': 82})
record3 = pd.Series({'Name': 'Helen',
                        'Class': 'Biology',
                        'Score': 90})

In [3]:
df = pd.DataFrame([record1, record2, record3],
                  index=['school1', 'school2', 'school1'])

df.head()

Unnamed: 0,Name,Class,Score
school1,Alice,Physics,85
school2,Jack,Chemistry,82
school1,Helen,Biology,90


In [4]:
# An alternative method -use a list of dictionaries, where each dictionary represents a row of data.

students = [{'Name': 'Alice',
              'Class': 'Physics',
              'Score': 85},
            {'Name': 'Jack',
             'Class': 'Chemistry',
             'Score': 82},
            {'Name': 'Helen',
             'Class': 'Biology',
             'Score': 90}]

df = pd.DataFrame(students, index=['school1', 'school2', 'school1'])

df.head()

Unnamed: 0,Name,Class,Score
school1,Alice,Physics,85
school2,Jack,Chemistry,82
school1,Helen,Biology,90


##### Extract Row

In [6]:
# Similar to the series, we can extract data using the .iloc and .loc attributes. 
df.loc['school2']

# Name of the series is returned as the index value, while the column name is included in the output.

Name          Jack
Class    Chemistry
Score           82
Name: school2, dtype: object

In [7]:
# We can check the data type of the return using the python type function.
type(df.loc['school2'])

pandas.core.series.Series

In [8]:
# If indices and column names along either axes horizontal or vertical are non-unique,
# multiple rows of the DataFrame will return, not as a new series, but as a new DataFrame.

df.loc['school1']

Unnamed: 0,Name,Class,Score
school1,Alice,Physics,85
school1,Helen,Biology,90


In [9]:
# One of the powers of the Panda's DataFrame is that you can quickly select data based on multiple axes.

df.loc['school1', 'Name']

school1    Alice
school1    Helen
Name: Name, dtype: object

##### Extract Column

In [10]:
# iloc and loc are used for row selection, Panda  the 
# indexing operator reserved directly on the DataFrame for column selection

df['Name']

school1    Alice
school2     Jack
school1    Helen
Name: Name, dtype: object

In [11]:
# Note too that the result of a single column projection is a Series object
type(df['Name'])

pandas.core.series.Series

##### Chaining Extract Operators

In [12]:
# Since the result of using the indexing operator is either a DataFrame or Series, you can chain operations together. 
df.loc['school1']['Name']

school1    Alice
school1    Helen
Name: Name, dtype: object

In [13]:
# Use type to check the responses from resulting operations
print(type(df.loc['school1'])) #should be a DataFrame
print(type(df.loc['school1']['Name'])) #should be a Series

<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.series.Series'>


In [14]:
# VV IMP

# Chaining, by indexing on the return type of another index, can come with some costs and is best avoided

# chaining tends to cause Pandas to return a copy of the DataFrame instead of a view on the DataFrame. 
# If you are changing data  this is an important distinction and can be a source of error.

In [15]:
# Here's another approach- .loc does row selection, and it can take two parameters, 
# the row index and the list of column names. The .loc attribute also supports slicing.

df.loc[:,['Name', 'Score']]

Unnamed: 0,Name,Score
school1,Alice,85
school2,Jack,82
school1,Helen,90


##### Dropping Data
Drop function and Del operator 

In [16]:
# delete data in Series and DataFrames- use the drop function  
# This function takes a single parameter, which is the index or row label, to drop. 

# VV IMP

# the drop function doesn't change the DataFrame by default! 
# Instead, the drop function returns to you a copy of the DataFrame with the given rows removed.

df.drop('school1')

Unnamed: 0,Name,Class,Score
school2,Jack,Chemistry,82


In [17]:
# in original DataFrame we see the data is still intact.
df

Unnamed: 0,Name,Class,Score
school1,Alice,Physics,85
school2,Jack,Chemistry,82
school1,Helen,Biology,90


In [18]:
# Optional parameters-  
# inplace = true, the DataFrame will be updated in place, instead of a copy being returned. 
# axes (which should be dropped) 0 -> row, 1 -> Column

copy_df = df.copy()
copy_df.drop("Name", inplace=True, axis=1)
copy_df

Unnamed: 0,Class,Score
school1,Physics,85
school2,Chemistry,82
school1,Biology,90


In [19]:
# del keyword to drop a column through the use of the indexing 
# This way of dropping data, however, takes immediate effect on the DataFrame and does not return a view.

del copy_df['Class']
copy_df

Unnamed: 0,Score
school1,85
school2,82
school1,90


##### Adding new Column

In [20]:
# As easy as assigning it to some value using the indexing operator. 

df['ClassRanking'] = None
df

Unnamed: 0,Name,Class,Score,ClassRanking
school1,Alice,Physics,85,
school2,Jack,Chemistry,82,
school1,Helen,Biology,90,
