<p><img src="https://i.postimg.cc/cCjTSn8r/ss-cumf.png" alt="" width="1280" height="300" /></p>

# **PANDAS**

Pandas es una librería de código abierto en Python que proporciona estructuras de datos y herramientas de análisis de datos. Es ampliamente utilizada en ciencia de datos y análisis para trabajar con datos estructurados, como tablas o conjuntos de datos, de una manera eficiente y flexible. Las principales estructuras de datos en Pandas son los DataFrames y las Series, que permiten realizar operaciones de limpieza, manipulación y análisis de datos de forma sencilla. En resumen, Pandas es una herramienta esencial para la manipulación y análisis de datos en Python.

## **RECOMENDACIONES**

Para usar bien este tutorial, ustede debe tener muy claro las siguientes cosas:


1. Bibliotecas, Paquetes y Módulos.
2. Diferenciar entre Rutas Relativas y Rutas Absolutas.
3. Dominar conceptos clave sobre funciones (parámetros y argumentos opcionales, posicionales, default, etc).
4. Que son los comandos mágicos
5. Funcionamiento genera notebooks



## **CREACIÓN ARCHIVO DEMO**

In [None]:
# creación de dataset
%%writefile salaries.csv
Name,Salary,Age
John,50000,34
Sally,120000,45
Alyssa,80000,27

Overwriting salaries.csv


## **IMPORTANDO LIBRERIA**

In [None]:
# pip install pandas | en sus maquinas
import pandas as pd

## **LEER ARCHIVOS**

In [None]:
df = pd.read_csv("salaries.csv", sep=",")

## **VER DATOS**

In [None]:
df

Unnamed: 0,Name,Salary,Age
0,John,50000,34
1,Sally,120000,45
2,Alyssa,80000,27


In [None]:
df

Unnamed: 0,Name,Salary,Age
0,John,50000,34
1,Sally,120000,45
2,Alyssa,80000,27


In [None]:
df

Unnamed: 0,Name,Salary,Age
0,John,50000,34
1,Sally,120000,45
2,Alyssa,80000,27


In [None]:
df

Unnamed: 0,Name,Salary,Age
0,John,50000,34
1,Sally,120000,45
2,Alyssa,80000,27


## **OBTENER INFO DF**

In [None]:
df.describe()

Unnamed: 0,Salary,Age
count,3.0,3.0
mean,83333.333333,35.333333
std,35118.845843,9.073772
min,50000.0,27.0
25%,65000.0,30.5
50%,80000.0,34.0
75%,100000.0,39.5
max,120000.0,45.0


In [None]:
# el indice y obtener los elementos
df.describe().loc[['count','max']]

Unnamed: 0,Salary,Age
count,3.0,3.0
max,120000.0,45.0


In [None]:
df.info() # SQL describe

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    3 non-null      object
 1   Salary  3 non-null      int64 
 2   Age     3 non-null      int64 
dtypes: int64(2), object(1)
memory usage: 200.0+ bytes


## **ACCEDIENDO ATRIBUTOS**

In [None]:
df["Name"]

0      John
1     Sally
2    Alyssa
Name: Name, dtype: object

In [None]:
df.Name

0      John
1     Sally
2    Alyssa
Name: Name, dtype: object

In [None]:
df[["Name", "Age"]]

Unnamed: 0,Name,Age
0,John,34
1,Sally,45
2,Alyssa,27


In [None]:
columnas = ["Name", "Age"]
df[columnas]

Unnamed: 0,Name,Age
0,John,34
1,Sally,45
2,Alyssa,27


In [None]:
df.Age.max()

45

In [None]:
df["Age"].max()

45

In [None]:
df["Age"].min()

27

In [None]:
df["Age"].count()

3

In [None]:
len(df["Age"])

3

In [None]:
df["Age"].mean()

35.333333333333336

## **FILTRANDO Y OBTENIENDO DATOS**

In [None]:
# filtrados df[columna y la accion]
# columna df["Age"]
# accion df["Age"] >29
df["Age"] >29

0     True
1     True
2    False
Name: Age, dtype: bool

In [None]:
# & and
# | or

In [None]:
df[df["Age"] >29]

Unnamed: 0,Name,Salary,Age
0,John,50000,34
1,Sally,120000,45


In [None]:
# colocar la accion en parentesis
df[(df["Age"] >29) & (df["Salary"] >= 100000)]

Unnamed: 0,Name,Salary,Age
1,Sally,120000,45


In [None]:
list(df["Name"].unique())

['John', 'Sally', 'Alyssa']

In [None]:
list(df["Name"].unique()) * 5

['John',
 'Sally',
 'Alyssa',
 'John',
 'Sally',
 'Alyssa',
 'John',
 'Sally',
 'Alyssa',
 'John',
 'Sally',
 'Alyssa',
 'John',
 'Sally',
 'Alyssa']

In [None]:
df.Age.values.tolist()

[34, 45, 27]

In [None]:
datos = "\n".join(df.Name.values.tolist())
print(datos)

John
Sally
Alyssa


## **USANDO INDICES Y SLICES**

In [None]:
# indices loc
# valor
# rango de filas iloc[]

In [None]:
df

Unnamed: 0,Name,Salary,Age
0,John,50000,34
1,Sally,120000,45
2,Alyssa,80000,27


In [None]:
# df = df.set_index("Name")
df.set_index("Name",inplace=True)

In [None]:
df

Unnamed: 0_level_0,Salary,Age
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
John,50000,34
Sally,120000,45
Alyssa,80000,27


In [None]:
df.loc["Sally"]

Salary    120000
Age           45
Name: Sally, dtype: int64

In [None]:
resultado = df.loc["Sally"]
resultado

Salary    120000
Age           45
Name: Sally, dtype: int64

In [None]:
resultado

Salary    120000
Age           45
Name: Sally, dtype: int64

In [None]:
df.reset_index(inplace=True)

In [None]:
df2 = pd.DataFrame(resultado)

In [None]:
df2

Unnamed: 0,Sally
Salary,120000
Age,45


In [None]:
df2.reset_index(inplace=True)

In [None]:
df2

Unnamed: 0,index,Sally
0,Salary,120000
1,Age,45


In [None]:
type(resultado)

pandas.core.series.Series

In [None]:
type(df)

pandas.core.frame.DataFrame

In [None]:
df

Unnamed: 0,Name,Salary,Age
0,John,50000,34
1,Sally,120000,45
2,Alyssa,80000,27


In [None]:
df.set_index(["Name", "Salary"],inplace=True)

In [None]:
df

Unnamed: 0_level_0,Unnamed: 1_level_0,Age
Name,Salary,Unnamed: 2_level_1
John,50000,34
Sally,120000,45
Alyssa,80000,27


In [None]:
df.loc["Sally", 120000]

Age    45
Name: (Sally, 120000), dtype: int64

In [None]:
df.reset_index(inplace=True)

In [None]:
df

Unnamed: 0,Name,Salary,Age
0,John,50000,34
1,Sally,120000,45
2,Alyssa,80000,27


In [None]:
# iloc -- slices
# iloc[slice para filas , slice para columnas]

In [None]:
df.iloc[1:]

Unnamed: 0,Name,Salary,Age
1,Sally,120000,45
2,Alyssa,80000,27


In [None]:
df.iloc[0:2]

Unnamed: 0,Name,Salary,Age
0,John,50000,34
1,Sally,120000,45


In [None]:
df.iloc[0:2, [-2,-1]]

Unnamed: 0,Salary,Age
0,50000,34
1,120000,45


In [None]:
df.iloc[0:2, 1:]

Unnamed: 0,Salary,Age
0,50000,34
1,120000,45


In [None]:
df.iloc[0:2, 0:1]

Unnamed: 0,Name
0,John
1,Sally


In [None]:
value = {
    "CATEGORIA": ["A", "B", "C", "B"],
    "NOMBRE": ["fff", "aaaaa", "aaaaa", "ssssss"]
}
print(value)

{'CATEGORIA': ['A', 'B', 'C', 'B'], 'NOMBRE': ['fff', 'aaaaa', 'aaaaa', 'ssssss']}


In [None]:
dict_to_df = pd.DataFrame(value)

In [None]:
dict_to_df

Unnamed: 0,CATEGORIA,NOMBRE
0,A,fff
1,B,aaaaa
2,C,aaaaa
3,B,ssssss


In [None]:
dict_to_df.set_index("CATEGORIA", inplace=True)

In [None]:
dict_to_df.loc["B"]

Unnamed: 0_level_0,NOMBRE
CATEGORIA,Unnamed: 1_level_1
B,aaaaa
B,ssssss


In [None]:
df

Unnamed: 0,Name,Salary,Age
0,John,50000,34
1,Sally,120000,45
2,Alyssa,80000,27


## **RENOMBRANDO Y ORDENANDO**

In [None]:
list(df.columns)

['Name', 'Salary', 'Age']

In [None]:
df.columns = ["nombre", "salario", "edad"]

In [None]:
df

Unnamed: 0,nombre,salario,edad
0,John,50000,34
1,Sally,120000,45
2,Alyssa,80000,27


In [None]:
list(df.columns)

['nombre', 'salario', 'edad']

In [None]:
df[["edad", "salario", "nombre"]]

Unnamed: 0,edad,salario,nombre
0,34,50000,John
1,45,120000,Sally
2,27,80000,Alyssa


In [None]:
df_ord = df.sort_values(by="edad")
df_ord

Unnamed: 0,nombre,salario,edad
2,Alyssa,80000,27
0,John,50000,34
1,Sally,120000,45


In [None]:
df.sort_values(by="edad", ascending=False)

Unnamed: 0,nombre,salario,edad
1,Sally,120000,45
0,John,50000,34
2,Alyssa,80000,27


In [None]:
df.sort_values(by=["nombre","salario"])

Unnamed: 0,nombre,salario,edad
2,Alyssa,80000,27
0,John,50000,34
1,Sally,120000,45


## **CREANDO CAMPOS NUEVOS**

In [None]:
df["constante"] = 1
df

Unnamed: 0,nombre,salario,edad,constante
0,John,50000,34,1
1,Sally,120000,45,1
2,Alyssa,80000,27,1


In [None]:
df["edad_fraccion"] = df["edad"] / 100
df

Unnamed: 0,nombre,salario,edad,constante,edad_fraccion
0,John,50000,34,1,0.34
1,Sally,120000,45,1,0.45
2,Alyssa,80000,27,1,0.27


## **APLICANDO FUNCIONES**

In [None]:
# apply : funciona a partir de funciones, puede ser anonima o declarada

In [None]:
def elevar_al_cuadrado(edad_fraccion: float):
  return edad_fraccion ** 2

In [None]:
df["edad_fraccion_elevada_1"] = df["edad"].apply(elevar_al_cuadrado)
df

Unnamed: 0,nombre,salario,edad,constante,edad_fraccion,edad_fraccion_elevada_1
0,John,50000,34,1,0.34,1156
1,Sally,120000,45,1,0.45,2025
2,Alyssa,80000,27,1,0.27,729


In [None]:
df["edad_fra_elev_2"] = df["edad"].apply(lambda valor: valor ** 2)
df

Unnamed: 0,nombre,salario,edad,constante,edad_fraccion,edad_fraccion_elevada_1,edad_fra_elev_2
0,John,50000,34,1,0.34,1156,1156
1,Sally,120000,45,1,0.45,2025,2025
2,Alyssa,80000,27,1,0.27,729,729


In [None]:
def ajustar_boundaries(fila):
  fila["edad_ajustada"] = fila.constante + fila.edad_fraccion
  fila["nombre"] = fila["nombre"].upper()
  return fila

In [None]:
# axis = 0 -- aplicar por cada columna
# fila = 1 --  aplicar por cada file
df.apply(ajustar_boundaries, axis=1)

Unnamed: 0,nombre,salario,edad,constante,edad_fraccion,edad_fraccion_elevada_1,edad_fra_elev_2,edad_ajustada
0,JOHN,50000,34,1,0.34,1156,1156,1.34
1,SALLY,120000,45,1,0.45,2025,2025,1.45
2,ALYSSA,80000,27,1,0.27,729,729,1.27
