# O que é a biblioteca Pandas 

Pandas é uma biblioteca do python que melhor trabalha com dados. Essa biblioteca permite trabalhar com dados de diversas fontes: arquivos CSV, HTML, SQL etc. Na area de dados com python, é indispensavel. 

O pandas funciona com "séries" e com "DataFrame". De acordo com a documentação oficial (https://pandas.pydata.org/docs/reference/index.html), séries seria uma matriz unidimensional que pode conter dados de quaisquer tipos. Ja o "DataFrame" seria uma estrutura de dados bem parecida com uma matriz bidimensional, que contém linhas e colunas. 

Uma vez que você importa um conjunto de dados usando Pandas, fica muito fácil fazer coisas do tipo:

- Extrai informações estatísticas
    - Qual a média, mediana, valores máximos e mínimos?
    - Qual é a distribuição das suas variáveis?
    - Qual a correlação entre duas variáveis quaisquer?
- Exportar os dados para um novo formato de arquivo
- Visualizar gráficos dos mais diferentes tipos
- Alimentar modelos de machine learning feitos em cima do Scikit-learn

Pandas é construído em cima de outra biblioteca extremamente popular, o NumPy. Quem já utilizou esta, vai encontrar muita similaridade com aquela.


Para instalar o pandas pelo terminal: 

1 - conda install pandas 
2 - pip install pandas 

In [16]:
import pandas as pd
import numpy as np 

## Séries 
"pd.Series(data, index)"  essa classe cria uma série e precisa de 2 parametros: os dados e os indices. os indices são os rótulos que ficam a esquerda, eles indicam a posicao, uma letra etc.

In [17]:
labels = ['a', 'b', 'c']
my_data = [10,20,30]
dict = {'a': 10, 'b': 20, 'c': 30}

pd.Series(data = my_data) #sem os indices
pd.Series(data = my_data, index=labels) #com os indices


a    10
b    20
c    30
dtype: int64

In [11]:
pd.Series(dict) #passando o dicionário, ja se tem uma série com os dados e os indices

a    10
b    20
c    30
dtype: int64

In [12]:
serie1 = pd.Series(['Brasil', 'EUA', 'France', 'Japan'],[1,2,3,4])
serie1

1    Brasil
2       EUA
3    France
4     Japan
dtype: object

In [13]:
serie2 = pd.Series([1,2,5,4], ['Brasil', 'USA', 'Italy', 'Japan'])
serie2

Brasil    1
USA       2
Italy     5
Japan     4
dtype: int64

In [18]:
# Como acessar os elementos associados aos indices 
serie2['Brasil']
serie3 =pd.Series(data=labels)
serie3[0]  #criando uma nova série assim, da pra associar o indice com o dado associado


'a'

## DataFrames 
```
como criar: pd.DataFrame(data= <dados>, index = <indices/linhas>, colum= <as colunas>) 
```

In [26]:
from numpy.random import randn 
np.random.seed(101)
# criando um DataFrame:

df = pd.read_csv("/Users/filipesamuel/Desktop/dataScienceFromScratch/dataSets/Student_Stress_Monitoring_Datasets/Stress_Dataset.csv")
df 

Unnamed: 0,Gender,Age,Have you recently experienced stress in your life?,Have you noticed a rapid heartbeat or palpitations?,Have you been dealing with anxiety or tension recently?,Do you face any sleep problems or difficulties falling asleep?,Have you been dealing with anxiety or tension recently?.1,Have you been getting headaches more often than usual?,Do you get irritated easily?,Do you have trouble concentrating on your academic tasks?,...,Are you facing any difficulties with your professors or instructors?,Is your working environment unpleasant or stressful?,Do you struggle to find time for relaxation and leisure activities?,Is your hostel or home environment causing you difficulties?,Do you lack confidence in your academic performance?,Do you lack confidence in your choice of academic subjects?,Academic and extracurricular activities conflicting for you?,Do you attend classes regularly?,Have you gained/lost weight?,Which type of stress do you primarily experience?
0,0,20,3,4,2,5,1,2,1,2,...,3,1,4,1,2,1,3,1,2,Eustress (Positive Stress) - Stress that motiv...
1,0,20,2,3,2,1,1,1,1,4,...,3,2,1,1,3,2,1,4,2,Eustress (Positive Stress) - Stress that motiv...
2,0,20,5,4,2,2,1,3,4,2,...,2,2,2,1,4,1,1,2,1,Eustress (Positive Stress) - Stress that motiv...
3,1,20,3,4,3,2,2,3,4,3,...,1,1,2,1,2,1,1,5,3,Eustress (Positive Stress) - Stress that motiv...
4,0,20,3,3,3,2,2,4,4,4,...,2,3,1,2,2,4,2,2,2,Eustress (Positive Stress) - Stress that motiv...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
838,0,21,3,4,2,3,5,1,5,4,...,2,3,3,3,4,1,2,2,2,Eustress (Positive Stress) - Stress that motiv...
839,1,19,3,2,1,2,2,1,2,3,...,1,1,1,3,2,1,2,3,1,No Stress - Currently experiencing minimal to ...
840,1,19,4,4,3,4,3,2,2,3,...,2,2,2,2,3,1,4,5,3,Eustress (Positive Stress) - Stress that motiv...
841,0,20,5,4,3,4,3,4,4,4,...,2,2,1,4,3,5,4,5,1,Eustress (Positive Stress) - Stress that motiv...


In [None]:
df["2"] # pegando toda a coluna "2"
df[["2", "3"]] #pegando as colunas 2 e 3, tipo um fatiamento 


Unnamed: 0,2,3
A,0.628133,0.907969
B,-0.848077,0.605965
C,-0.589001,0.188695
D,0.190794,1.978757
E,1.693723,-1.706086


In [None]:
# criando uma nova coluna

df["new_colum"] = df["2"] + df["3"] # a coluna "new" será uma coluna com a soma dos elementos de "3" e "2"
df

Unnamed: 0,1,2,3,4,5,new_colum
A,2.70685,0.628133,0.907969,0.503826,0.651118,1.536102
B,-0.319318,-0.848077,0.605965,-2.018168,0.740122,-0.242112
C,0.528813,-0.589001,0.188695,-0.758872,-0.933237,-0.400305
D,0.955057,0.190794,1.978757,2.605967,0.683509,2.169552
E,0.302665,1.693723,-1.706086,-1.159119,-0.134841,-0.012363


In [None]:
# retirando uma coluna do DataFrame 

df.drop("new_colum", axis= 1,inplace= True) # o axis indica de que ponto quer cortar a coluna
            # inplace precisa ser passado para dizer que em todo o código a nova coluna é descartada.
            # se não passar, ela só some na celula em que esta sendo executada
df 

Unnamed: 0,1,2,3,4,5
A,2.70685,0.628133,0.907969,0.503826,0.651118
B,-0.319318,-0.848077,0.605965,-2.018168,0.740122
C,0.528813,-0.589001,0.188695,-0.758872,-0.933237
D,0.955057,0.190794,1.978757,2.605967,0.683509
E,0.302665,1.693723,-1.706086,-1.159119,-0.134841


In [None]:
# acessando as linhas

df.loc["A"] 

1    2.706850
2    0.628133
3    0.907969
4    0.503826
5    0.651118
Name: A, dtype: float64

In [None]:
# utilizando seleção condicional 

df > 0


Unnamed: 0,1,2,3,4,5
A,True,True,True,True,True
B,False,False,True,False,True
C,True,False,True,False,False
D,True,True,True,True,True
E,True,True,False,False,False


In [None]:
df[(df["1"] > 0) & (df["2"] < 1)]  #condicional 

Unnamed: 0,1,2,3,4,5
A,2.70685,0.628133,0.907969,0.503826,0.651118
C,0.528813,-0.589001,0.188695,-0.758872,-0.933237
D,0.955057,0.190794,1.978757,2.605967,0.683509


## Ver dimensões do DataFrame

Basicamente, o arquivo que importamos se parece com uma simples tabela de Excel, composta por linhas e colunas. Para você ver o tamanho dessa "tabela", o que significa ver o formato (shape) dela, basta executar "df.shape."

In [23]:
df.shape

(5, 5)

## Conhecendo os dados

Uma vez que você importou a sua base de dados para o Pandas, existem muitos atributos e métodos nativos da estrutura DataFrame que facilitam muito a exploração de dados.

Uma das principais funções da biblioteca, e que você irá usar em praticamente todos os seus projetos é "df.head()" e "df.tail()".


Na verdade, quando a gente importa um dataset, queremos dar uma olhadinha rápida em algumas entradas, só para ter noção dos dados que iremos lidar. Isso é feito facilmente com:

- df.head() - exibe as 5 primeiras entradas do conjunto de dados
- df.tail() - exibe as 5 últimas entradas do conjunto de dados

In [27]:
df.head()

Unnamed: 0,Gender,Age,Have you recently experienced stress in your life?,Have you noticed a rapid heartbeat or palpitations?,Have you been dealing with anxiety or tension recently?,Do you face any sleep problems or difficulties falling asleep?,Have you been dealing with anxiety or tension recently?.1,Have you been getting headaches more often than usual?,Do you get irritated easily?,Do you have trouble concentrating on your academic tasks?,...,Are you facing any difficulties with your professors or instructors?,Is your working environment unpleasant or stressful?,Do you struggle to find time for relaxation and leisure activities?,Is your hostel or home environment causing you difficulties?,Do you lack confidence in your academic performance?,Do you lack confidence in your choice of academic subjects?,Academic and extracurricular activities conflicting for you?,Do you attend classes regularly?,Have you gained/lost weight?,Which type of stress do you primarily experience?
0,0,20,3,4,2,5,1,2,1,2,...,3,1,4,1,2,1,3,1,2,Eustress (Positive Stress) - Stress that motiv...
1,0,20,2,3,2,1,1,1,1,4,...,3,2,1,1,3,2,1,4,2,Eustress (Positive Stress) - Stress that motiv...
2,0,20,5,4,2,2,1,3,4,2,...,2,2,2,1,4,1,1,2,1,Eustress (Positive Stress) - Stress that motiv...
3,1,20,3,4,3,2,2,3,4,3,...,1,1,2,1,2,1,1,5,3,Eustress (Positive Stress) - Stress that motiv...
4,0,20,3,3,3,2,2,4,4,4,...,2,3,1,2,2,4,2,2,2,Eustress (Positive Stress) - Stress that motiv...


In [28]:
df.tail() 

Unnamed: 0,Gender,Age,Have you recently experienced stress in your life?,Have you noticed a rapid heartbeat or palpitations?,Have you been dealing with anxiety or tension recently?,Do you face any sleep problems or difficulties falling asleep?,Have you been dealing with anxiety or tension recently?.1,Have you been getting headaches more often than usual?,Do you get irritated easily?,Do you have trouble concentrating on your academic tasks?,...,Are you facing any difficulties with your professors or instructors?,Is your working environment unpleasant or stressful?,Do you struggle to find time for relaxation and leisure activities?,Is your hostel or home environment causing you difficulties?,Do you lack confidence in your academic performance?,Do you lack confidence in your choice of academic subjects?,Academic and extracurricular activities conflicting for you?,Do you attend classes regularly?,Have you gained/lost weight?,Which type of stress do you primarily experience?
838,0,21,3,4,2,3,5,1,5,4,...,2,3,3,3,4,1,2,2,2,Eustress (Positive Stress) - Stress that motiv...
839,1,19,3,2,1,2,2,1,2,3,...,1,1,1,3,2,1,2,3,1,No Stress - Currently experiencing minimal to ...
840,1,19,4,4,3,4,3,2,2,3,...,2,2,2,2,3,1,4,5,3,Eustress (Positive Stress) - Stress that motiv...
841,0,20,5,4,3,4,3,4,4,4,...,2,2,1,4,3,5,4,5,1,Eustress (Positive Stress) - Stress that motiv...
842,0,19,4,3,1,3,2,1,1,4,...,1,1,3,1,1,2,2,5,4,Eustress (Positive Stress) - Stress that motiv...


In [29]:
df.columns

Index(['Gender', 'Age', 'Have you recently experienced stress in your life?',
       'Have you noticed a rapid heartbeat or palpitations?',
       'Have you been dealing with anxiety or tension recently?',
       'Do you face any sleep problems or difficulties falling asleep?',
       'Have you been dealing with anxiety or tension recently?.1',
       'Have you been getting headaches more often than usual?',
       'Do you get irritated easily?',
       'Do you have trouble concentrating on your academic tasks?',
       'Have you been feeling sadness or low mood?',
       'Have you been experiencing any illness or health issues?',
       'Do you often feel lonely or isolated?',
       'Do you feel overwhelmed with your academic workload?',
       'Are you in competition with your peers, and does it affect you?',
       'Do you find that your relationship often causes you stress?',
       'Are you facing any difficulties with your professors or instructors?',
       'Is your working env

### Conhecer o tipo de variável que está representado em cada coluna é essencial. Por exemplo:

- Quando temos uma coluna que trata de receita, despesa ou lucro, é desejável que lidemos com variáveis do tipo float.
- Quando estamos lidando com anos (2017, 2018, 2019), iremos desejar trabalhar com variáveis do tipo int.
- Quanto temos datas completas (2019-12-30 07:37), iremos desejar usar o formato datetime, para conseguir manipular adequadamente o dataset.

Para conhecer os tipos de variáveis de cada coluna, use "df.dtypes".

In [31]:
df.dtypes

Gender                                                                   int64
Age                                                                      int64
Have you recently experienced stress in your life?                       int64
Have you noticed a rapid heartbeat or palpitations?                      int64
Have you been dealing with anxiety or tension recently?                  int64
Do you face any sleep problems or difficulties falling asleep?           int64
Have you been dealing with anxiety or tension recently?.1                int64
Have you been getting headaches more often than usual?                   int64
Do you get irritated easily?                                             int64
Do you have trouble concentrating on your academic tasks?                int64
Have you been feeling sadness or low mood?                               int64
Have you been experiencing any illness or health issues?                 int64
Do you often feel lonely or isolated?               