<a href="https://colab.research.google.com/github/CarlosLeandro09/DataAnalysisRadiology/blob/main/Um_pouco_de_Pandas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Basic stage**

**1.** Importações importantes para um estágio inicial

In [2]:
import pandas as pd
import numpy as np

**2.** Dados em uma escala temporal: **Séries**

In [3]:
series = pd.Series([np.nan, 0, 1, 2])
series

0    NaN
1    0.0
2    1.0
3    2.0
dtype: float64

**3.** Ainda relativo a séries, vamos "manipular" **datas**...

pd.date_range(start=None, end=None, periods=None, freq=None, tz=None, normalize=False, name=None, closed=None, **kwargs)   

In [4]:
datas = pd.date_range("20200101",periods=4,freq="D")
datas

DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04'], dtype='datetime64[ns]', freq='D')

**4.** Criação de **DataFrame**

In [5]:
df = pd.DataFrame(np.random.randn(4,4), index = datas, columns = list("ABCD"))
df

Unnamed: 0,A,B,C,D
2020-01-01,0.377983,-0.080575,-1.224774,-1.431919
2020-01-02,0.080579,-0.157075,-1.161359,-0.472065
2020-01-03,-1.087464,0.138276,0.076402,-0.251545
2020-01-04,0.221153,1.745922,1.327153,0.433962


In [6]:
df2 = pd.DataFrame({"A":7,
                    "B":pd.Series(1,index=list(range(5)),dtype="float32"),
                    "C":np.array([3]*5,dtype="int32"), 
                    "D":pd.Categorical(["Carro","Coelho","Caipora","Cigarro","Cinema"]),
                    "E":pd.Timestamp("20190204"),
                    "F":"Dragonball"})
df2

Unnamed: 0,A,B,C,D,E,F
0,7,1.0,3,Carro,2019-02-04,Dragonball
1,7,1.0,3,Coelho,2019-02-04,Dragonball
2,7,1.0,3,Caipora,2019-02-04,Dragonball
3,7,1.0,3,Cigarro,2019-02-04,Dragonball
4,7,1.0,3,Cinema,2019-02-04,Dragonball


In [7]:
df2.head(3)

Unnamed: 0,A,B,C,D,E,F
0,7,1.0,3,Carro,2019-02-04,Dragonball
1,7,1.0,3,Coelho,2019-02-04,Dragonball
2,7,1.0,3,Caipora,2019-02-04,Dragonball


In [8]:
df2.tail(3)

Unnamed: 0,A,B,C,D,E,F
2,7,1.0,3,Caipora,2019-02-04,Dragonball
3,7,1.0,3,Cigarro,2019-02-04,Dragonball
4,7,1.0,3,Cinema,2019-02-04,Dragonball


In [9]:
df2.index

Int64Index([0, 1, 2, 3, 4], dtype='int64')

In [10]:
df2.columns

Index(['A', 'B', 'C', 'D', 'E', 'F'], dtype='object')

In [11]:
#df --> Tira índices e colunas
df.to_numpy()

array([[ 0.37798319, -0.08057534, -1.2247735 , -1.43191929],
       [ 0.08057924, -0.15707529, -1.1613592 , -0.4720646 ],
       [-1.0874643 ,  0.13827593,  0.07640192, -0.25154547],
       [ 0.2211527 ,  1.74592232,  1.32715267,  0.43396162]])

In [12]:
df2.dtypes

A             int64
B           float32
C             int32
D          category
E    datetime64[ns]
F            object
dtype: object

In [13]:
df2.shape

(5, 6)

**5.** Adição de **colunas** ao DataFrame

In [14]:
df2["G"] = pd.Series("RX",index=list(range(5)),dtype="str")
df2

Unnamed: 0,A,B,C,D,E,F,G
0,7,1.0,3,Carro,2019-02-04,Dragonball,RX
1,7,1.0,3,Coelho,2019-02-04,Dragonball,RX
2,7,1.0,3,Caipora,2019-02-04,Dragonball,RX
3,7,1.0,3,Cigarro,2019-02-04,Dragonball,RX
4,7,1.0,3,Cinema,2019-02-04,Dragonball,RX


**6.** **Operação** entre colunas

In [15]:
df2["Soma"] = df2["A"] + df2["C"]
df2

Unnamed: 0,A,B,C,D,E,F,G,Soma
0,7,1.0,3,Carro,2019-02-04,Dragonball,RX,10
1,7,1.0,3,Coelho,2019-02-04,Dragonball,RX,10
2,7,1.0,3,Caipora,2019-02-04,Dragonball,RX,10
3,7,1.0,3,Cigarro,2019-02-04,Dragonball,RX,10
4,7,1.0,3,Cinema,2019-02-04,Dragonball,RX,10


In [16]:
#Transposta
df2.T

Unnamed: 0,0,1,2,3,4
A,7,7,7,7,7
B,1,1,1,1,1
C,3,3,3,3,3
D,Carro,Coelho,Caipora,Cigarro,Cinema
E,2019-02-04 00:00:00,2019-02-04 00:00:00,2019-02-04 00:00:00,2019-02-04 00:00:00,2019-02-04 00:00:00
F,Dragonball,Dragonball,Dragonball,Dragonball,Dragonball
G,RX,RX,RX,RX,RX
Soma,10,10,10,10,10


**7.** **Concatenando** dataframes

In [18]:
df1 = pd.DataFrame(np.random.randn(2,2), index = pd.date_range("20190104",periods=2,freq="D"), columns = list("AB"))
df2 = pd.DataFrame(np.random.randn(2,2), index = pd.date_range("20190106",periods=2,freq="D"), columns = list("AB"))
df3 = pd.DataFrame(np.random.randn(2,2), index = pd.date_range("20190108",periods=2,freq="D"), columns = list("AB"))

In [20]:
combinacao = pd.concat([df1,df2,df3],keys=["df1","df2","df3"])
combinacao

Unnamed: 0,Unnamed: 1,A,B
df1,2019-01-04,-1.942876,0.281525
df1,2019-01-05,0.094521,0.029968
df2,2019-01-06,0.500834,-1.647706
df2,2019-01-07,0.774551,-1.103098
df3,2019-01-08,0.442636,-0.734281
df3,2019-01-09,0.011706,1.962418


In [21]:
#"Selecionando" informações da coluna
combinacao["A"]

df1  2019-01-04   -1.942876
     2019-01-05    0.094521
df2  2019-01-06    0.500834
     2019-01-07    0.774551
df3  2019-01-08    0.442636
     2019-01-09    0.011706
Name: A, dtype: float64

In [23]:
#Selecionando key
combinacao.loc["df1"]

Unnamed: 0,A,B
2019-01-04,-1.942876,0.281525
2019-01-05,0.094521,0.029968


**8.** Aplicando **Merge**