<a href="https://colab.research.google.com/github/brunofbpaula/DataScience-UM-Coursera/blob/main/Pandas/DataFrame/DataFrameStructure.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# DataFrame

This structure is the heart of the pandas. It's conceptually a two-dimensional series object, where there's an index and multiple columns of content, with each column having a label. In fact, the distribution between a column and a row is really only a conceptually distiction. It's basically a two-axes labeled array.

In [1]:
import pandas as pd

In [68]:
# Jujutsu's characters, their domain expansions and cursed techniques
character1 = {
    'Name': 'Yuji Itadori',
    'Domain Expansion': None,
    'Cursed Technique': None
}

character2 = {
    "Name": 'Aoi Todo',
    'Domain Expansion': None,
    "Cursed Technique": "Boogie Woogie"
}

character3 = {
    "Name": 'Megumi Fushiguro',
    'Domain Expansion': 'Chimera Shadow Garden',
    "Cursed Technique": "Divine Dogs"
}

# Indexes
jjk_schools = ['Tokyo', 'Kyoto', 'Tokyo']

jjk_df = pd.DataFrame([character1, character2, character3], index=jjk_schools)

jjk_df

Unnamed: 0,Name,Domain Expansion,Cursed Technique
Tokyo,Yuji Itadori,,
Kyoto,Aoi Todo,,Boogie Woogie
Tokyo,Megumi Fushiguro,Chimera Shadow Garden,Divine Dogs


It's possible to do the same with a list of dictionaries, where each dictionary represents a row of data. And similar to the Series object, we can acess a row using its index through the 'loc' atributte. Same thing to non-unique indexes, it will return all the rows it appears on. Manipulation is pretty the same.

In [69]:
print(jjk_df.loc['Kyoto'])

print('\nDataType: {}'.format(type(jjk_df.loc['Kyoto'])))

Name                     Aoi Todo
Domain Expansion             None
Cursed Technique    Boogie Woogie
Name: Kyoto, dtype: object

DataType: <class 'pandas.core.series.Series'>


One new thing is transposing the matrix. If we want to select a single column, we could transpose the matrix. This pivots all the of the rows into columns and vice-versa, and is done with the T atributte.

In [70]:
jjk_df.T

Unnamed: 0,Tokyo,Kyoto,Tokyo.1
Name,Yuji Itadori,Aoi Todo,Megumi Fushiguro
Domain Expansion,,,Chimera Shadow Garden
Cursed Technique,,Boogie Woogie,Divine Dogs


In [71]:
jjk_df.T.loc['Cursed Technique']

Tokyo             None
Kyoto    Boogie Woogie
Tokyo      Divine Dogs
Name: Cursed Technique, dtype: object

Chaining, by indexing on the return type of another index, can come with some costs and is best avoided if it's possible to use another approach. In particular, chaining tends to cause pandas to return a copy of the DataFrame, instead of a view on the DataFrame. It might be slower than necessary.

We can use a colon to indicate a full slice from beginning to end. To include make columns, we put them in a list.

In [72]:
jjk_df.loc[:, ['Name', 'Cursed Technique']]

Unnamed: 0,Name,Cursed Technique
Tokyo,Yuji Itadori,
Kyoto,Aoi Todo,Boogie Woogie
Tokyo,Megumi Fushiguro,Divine Dogs


# Dropping Data
We use the drop function. It takes a single parameter, which is either the index or row label, to drop. This function doesn't affect the DataFrame by default. In fact, it returns a copy of the DataFrame with the given rows removed.

In [73]:
jjk_df.drop('Tokyo')

Unnamed: 0,Name,Domain Expansion,Cursed Technique
Kyoto,Aoi Todo,,Boogie Woogie


In [74]:
# As said before, it won't change the original dataframe
jjk_df

Unnamed: 0,Name,Domain Expansion,Cursed Technique
Tokyo,Yuji Itadori,,
Kyoto,Aoi Todo,,Boogie Woogie
Tokyo,Megumi Fushiguro,Chimera Shadow Garden,Divine Dogs


The drop function has two parameters: inplace, which if it is set to True, the DataFrame will be updated in place, and axes, which should be dropped, and its default value is zero, indicating the row of axis. It can be changed to one if there's a need to drop a column.

In [75]:
# First, let's make a copy of the original DataFrame
copy = jjk_df.copy()

# Now let's drop the CT column
copy.drop('Cursed Technique', inplace=True, axis=1)

copy

Unnamed: 0,Name,Domain Expansion
Tokyo,Yuji Itadori,
Kyoto,Aoi Todo,
Tokyo,Megumi Fushiguro,Chimera Shadow Garden


There's a second way to drop a column, that's directly through the use of the indexing operator, using the del keyword. It takes immediate effect on the DataFrame and doesn't return a view.

In [76]:
del copy['Domain Expansion']

copy

Unnamed: 0,Name
Tokyo,Yuji Itadori
Kyoto,Aoi Todo
Tokyo,Megumi Fushiguro


# Adding a new column

We just need to assign it to some value using the indexing operator. For instance, if we want to add a new column with a default value of None, we could do so by using the assignment operator after the square brackets.

In [88]:
jjk_df['Clan'] = None

jjk_df

Unnamed: 0,Name,Domain Expansion,Cursed Technique,Clan
Tokyo,Yuji Itadori,,,
Kyoto,Aoi Todo,,Boogie Woogie,
Tokyo,Megumi Fushiguro,Chimera Shadow Garden,Divine Dogs,


In [85]:
jjk_df['Name'] == 'Megumi Fushiguro'

Tokyo    False
Kyoto    False
Tokyo     True
Name: Name, dtype: bool