<h1>Dataframes</h1>

<p>DataFrames are a two-dimensional data structure in Pandas that can store data of various types in columns with column labels and row indices, enabling powerful and efficient data manipulations such as selection, filtering, grouping, sorting, and joining of tables. They are similar to Excel spreadsheets or database tables, but with additional features that facilitate working with data in Python. Pandas offers a wide range of functions for working with DataFrames and manipulating data in a flexible and effective manner.</p>

<p>DataFrames são uma estrutura de dados bidimensional do Pandas que podem armazenar dados de vários tipos em colunas com rótulos de coluna e índices de linha, permitindo manipulações de dados poderosas e eficientes, como seleção, filtragem, agrupamento, ordenação e junção de tabelas. Eles são semelhantes a planilhas do Excel ou a tabelas de banco de dados, mas com recursos adicionais que facilitam o trabalho com dados em Python. O Pandas oferece uma ampla gama de funções para trabalhar com DataFrames e manipular dados de maneira flexível e eficaz.</p>

In [3]:
import pandas as pd
from numpy.random import randn
#Numpy function to generate random numbers

In [6]:
df = pd.DataFrame(randn(5,4), index=["A", "B" ,"C" ,"D" , "E"], columns="W X Y Z".split())
#Using randn, I created a two-dimensional array with 5 rows and 4 columns of random numbers,
#and then specified the corresponding indices and columns.
df

Unnamed: 0,W,X,Y,Z
A,-0.225115,-0.242386,1.535231,-1.443307
B,0.683688,0.232907,0.170702,-0.279215
C,0.087038,1.189413,-1.036464,0.527225
D,-0.822524,-0.552841,0.608156,-0.73923
E,1.091635,0.197705,1.319319,-0.176711


In [8]:
#To select elements from the DataFrame, I indicate the column, 
#unlike when working with Series where I indicate the index.
#When selecting a column, I get back a Series, the DataFrame is a collection of Series.

df['W']

A   -0.225115
B    0.683688
C    0.087038
D   -0.822524
E    1.091635
Name: W, dtype: float64

In [9]:
#If I make the same query, but put the column name in a list,
#even if it's just one column, I'll get back a DataFrame with a single column.

df[['W']]

Unnamed: 0,W
A,-0.225115
B,0.683688
C,0.087038
D,-0.822524
E,1.091635


In [10]:
#To create new columns, simply pass a name that doesn't exist, 
#assigning the values for that column. In this case, we created a copy of column W.

df['New Column'] = df['W']
df

Unnamed: 0,W,X,Y,Z,New Column
A,-0.225115,-0.242386,1.535231,-1.443307,-0.225115
B,0.683688,0.232907,0.170702,-0.279215,0.683688
C,0.087038,1.189413,-1.036464,0.527225,0.087038
D,-0.822524,-0.552841,0.608156,-0.73923,-0.822524
E,1.091635,0.197705,1.319319,-0.176711,1.091635


In [15]:
#I can create new columns by performing arithmetic operations.

df['Another column'] = df['W'] + df['Y']
df

Unnamed: 0,W,X,Y,Z,New Column,Another column
A,-0.225115,-0.242386,1.535231,-1.443307,-0.225115,1.310116
B,0.683688,0.232907,0.170702,-0.279215,0.683688,0.854391
C,0.087038,1.189413,-1.036464,0.527225,0.087038,-0.949427
D,-0.822524,-0.552841,0.608156,-0.73923,-0.822524,-0.214368
E,1.091635,0.197705,1.319319,-0.176711,1.091635,2.410954


In [16]:
#I can delete a column using the 'del' command.
del df['Another column']
df

Unnamed: 0,W,X,Y,Z,New Column
A,-0.225115,-0.242386,1.535231,-1.443307,-0.225115
B,0.683688,0.232907,0.170702,-0.279215,0.683688
C,0.087038,1.189413,-1.036464,0.527225,0.087038
D,-0.822524,-0.552841,0.608156,-0.73923,-0.822524
E,1.091635,0.197705,1.319319,-0.176711,1.091635


In [19]:
#I can delete a column using the 'drop' method
#"axis 1" corresponds to column and "axis 0" corresponds to row
df.drop('New Column', axis=1)

Unnamed: 0,W,X,Y,Z
A,-0.225115,-0.242386,1.535231,-1.443307
B,0.683688,0.232907,0.170702,-0.279215
C,0.087038,1.189413,-1.036464,0.527225
D,-0.822524,-0.552841,0.608156,-0.73923
E,1.091635,0.197705,1.319319,-0.176711


In [21]:
#In the case of 'drop', it manipulates a copy of the DataFrame and presents it to us,
#but the original remains intact.

df

Unnamed: 0,W,X,Y,Z,New Column
A,-0.225115,-0.242386,1.535231,-1.443307,-0.225115
B,0.683688,0.232907,0.170702,-0.279215,0.683688
C,0.087038,1.189413,-1.036464,0.527225,0.087038
D,-0.822524,-0.552841,0.608156,-0.73923,-0.822524
E,1.091635,0.197705,1.319319,-0.176711,1.091635


In [23]:
#I can use drop to save a modified copy in another variable, creating a new dataframe

df2 = df.drop('New Column', axis=1)
df2

Unnamed: 0,W,X,Y,Z
A,-0.225115,-0.242386,1.535231,-1.443307
B,0.683688,0.232907,0.170702,-0.279215
C,0.087038,1.189413,-1.036464,0.527225
D,-0.822524,-0.552841,0.608156,-0.73923
E,1.091635,0.197705,1.319319,-0.176711


In [24]:
#To make the drop function modify the original dataframe, I need to pass the inplace attribute as true
df.drop('New Column', axis=1, inplace=True)
df

Unnamed: 0,W,X,Y,Z
A,-0.225115,-0.242386,1.535231,-1.443307
B,0.683688,0.232907,0.170702,-0.279215
C,0.087038,1.189413,-1.036464,0.527225
D,-0.822524,-0.552841,0.608156,-0.73923
E,1.091635,0.197705,1.319319,-0.176711


In [29]:
#I can use "loc" to select rows
#In this way, it returns the row in a series format

df.loc['A']

W   -0.225115
X   -0.242386
Y    1.535231
Z   -1.443307
Name: A, dtype: float64

In [30]:
#In this way, it returns the row in a DataFrames format

df.loc[['A']]

Unnamed: 0,W,X,Y,Z
A,-0.225115,-0.242386,1.535231,-1.443307


In [31]:
df.loc[['A','B']]

Unnamed: 0,W,X,Y,Z
A,-0.225115,-0.242386,1.535231,-1.443307
B,0.683688,0.232907,0.170702,-0.279215


In [33]:
#With iloc I can select specific elements by indicating the row and column

df.iloc[0,2] # row A and column Y

1.5352314815337458

In [34]:
#In loc, from the second element onwards, it is understood that 
#I am referring to the column, so I can pass the rows and columns like in iloc,
#but in iloc it is done numerically and in loc I pass the index

df.loc[['A','B'],'X']

A   -0.242386
B    0.232907
Name: X, dtype: float64

In [35]:
df.loc[['A','B'],['X']]

Unnamed: 0,X
A,-0.242386
B,0.232907
