# Pandas 

## contenido
- Fundamentos y E/S
- Selección, filtrado e indexación 
- Limpieza y preparación de datos 
- Transformaciones y funciones
- Agrupaciones, pivotes y ventanas 
- Joins, concat y reshaping 
- Series de tiempo 
- Texto y categóricos
- Rendimiento y depuración
- Visualización con Matplotlib 

# importaciones

In [None]:
import numpy as np
import pandas as pd
from pathlib import Path

# rutas

In [None]:
np.random.seed(42)
data_dir = Path('static/data/')
data_dir.mkdir(parents=True, exist_ok=True)

# Fundamentos y E/S

### Crear Series y DataFrames

![Serie](static/img/serie.png) ![DataFrame](static/img/dataframe.png) 

In [13]:
s = ["x","y","z"]
series = pd.Series(s, name="S")
print(series)

0    x
1    y
2    z
Name: S, dtype: object


In [14]:
a = [1,2,3]
b = ["x","y","z"]
df = pd.DataFrame({"A":a, "B":b})
print(df)

   A  B
0  1  x
1  2  y
2  3  z


### Lectura

In [20]:
# ruta
ruta_csv = "static/csv/winemag-data_first150k.csv"
# variable
#lectura de csv
df_csv = pd.read_csv(ruta_csv,delimiter=";")
df_csv.head(2)
#lectura de excel
#ruta_excel = "static/excel/mi_excel.xlsx"
#df_excel = pd.read_excel(ruta_excel, sheet_name="Hoja1")
#LECTURA DE JSON
# ruta_json = "static/json/mi_json.json"  
# df_json = pd.read_json(ruta_json)
#LECTURA DE HTML
# ruta_html = "static/html/mi_html.html"
# df_html = pd.read_html(ruta_html)
#LECTURA DE PARQUET
# ruta_parquet = "static/parquet/mi_parquet.parquet"    
# df_parquet = pd.read_parquet(ruta_parquet)

Unnamed: 0,country,designation,points,price,province,region_1,region_2,variety,winery,last_year_points
0,US,Martha's Vineyard,96.0,235.0,California,Napa Valley,Napa,Cabernet Sauvignon,Heitz,94
1,Spain,Carodorum Selección Especial Reserva,96.0,110.0,Northern Spain,Toro,,Tinta de Toro,Bodega Carmen Rodríguez,92


### Selección de columnas

In [22]:
columnas = ['country', 'designation', 'points', 'price', 'province', 'region_1','region_2', 'variety', 'winery', 'last_year_points']
columnas

['country',
 'designation',
 'points',
 'price',
 'province',
 'region_1',
 'region_2',
 'variety',
 'winery',
 'last_year_points']

In [23]:
columnas_filtradas = ['country', 'designation', 'price']
df_csv['country'].head(2)

0       US
1    Spain
Name: country, dtype: object

In [25]:
columnas_filtradas = ['country', 'designation', 'price']
df_csv[columnas_filtradas].head(2)
#df_2 =df_csv[columnas_filtradas].head(2)

Unnamed: 0,country,designation,price
0,US,Martha's Vineyard,235.0
1,Spain,Carodorum Selección Especial Reserva,110.0


### Información general dataframe

In [26]:
df_csv.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 144037 entries, 0 to 144036
Data columns (total 10 columns):
 #   Column            Non-Null Count   Dtype  
---  ------            --------------   -----  
 0   country           144035 non-null  object 
 1   designation       100211 non-null  object 
 2   points            144032 non-null  float64
 3   price             130641 non-null  float64
 4   province          144030 non-null  object 
 5   region_1          120192 non-null  object 
 6   region_2          58378 non-null   object 
 7   variety           144032 non-null  object 
 8   winery            144032 non-null  object 
 9   last_year_points  144037 non-null  int64  
dtypes: float64(2), int64(1), object(7)
memory usage: 11.0+ MB


In [28]:
df_csv.describe()

Unnamed: 0,points,price,last_year_points
count,144032.0,130641.0,144037.0
mean,87.873424,33.123399,89.998452
std,3.215821,36.368177,6.05024
min,80.0,4.0,80.0
25%,86.0,16.0,85.0
50%,88.0,24.0,90.0
75%,90.0,40.0,95.0
max,100.0,2300.0,100.0


### Tipos de Conversion