<a href="https://colab.research.google.com/github/AenimaLabs/pandas_tratamiento_manipulacion_datos/blob/main/Pandas_tratamiento_manipulacion_datos.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#PROBLEMA DE NEGOCIO

## **Pricing Inteligente para Alojamiento**

### Definición
Es una estrategia automatizada y dinámica para estimar precios de alojamientos, basada en factores como:
- Oferta y demanda
- Estacionalidad
- Eventos locales
- Características de la ubicación

## Objetivo
Maximizar los ingresos y la rentabilidad del propietario, ajustando los precios según las condiciones del mercado.

## Funcionamiento
- Los precios se ajustan automáticamente.
- **Suben** cuando hay alta demanda para optimizar ingresos.
- **Bajan** cuando la demanda disminuye para mantener la ocupación.

## Tecnologías Utilizadas
- **Aprendizaje automático**: Permite análisis avanzado de datos y patrones de comportamiento.
- **Modelo de reglas**: Basado en lógica y heurística, útil en escenarios más simples o controlados.

## Ventajas del Aprendizaje Automático
- Analiza grandes volúmenes de datos.
- Detecta patrones complejos de comportamiento del consumidor.
- Facilita ajustes más precisos y dinámicos en tiempo real.

##Descripción de las Columnas del Archivo `datos_hosting.json`

### A continuación, se detalla el significado de cada columna presente en el conjunto de datos:

## 1. `evaluacion_general`
- **Descripción**: Puntuación media otorgada para evaluar la calidad del alojamiento en la propiedad.
- **Tipo de dato**: Numérico (generalmente entre 1 y 5).

## 2. `experiencia_local`
- **Descripción**: Descripción de las experiencias ofrecidas durante la estancia en la propiedad.
- **Tipo de dato**: Texto.

## 3. `max_hospedes`
- **Descripción**: Número máximo de invitados que permite la ubicación.
- **Tipo de dato**: Entero.

## 4. `descripcion_local`
- **Descripción**: Descripción general de la propiedad.
- **Tipo de dato**: Texto.

## 5. `descripcion_vecindad`
- **Descripción**: Descripción del vecindario o barrio donde se encuentra la propiedad.
- **Tipo de dato**: Texto.

## 6. `cantidad_baños`
- **Descripción**: Número de baños disponibles en la propiedad.
- **Tipo de dato**: Entero o Decimal (en caso de incluir medios baños).

## 7. `cantidad_cuartos`
- **Descripción**: Número de habitaciones disponibles en la propiedad.
- **Tipo de dato**: Entero.

## 8. `cantidad_camas`
- **Descripción**: Número de camas disponibles en la propiedad.
- **Tipo de dato**: Entero.

## 9. `modelo_cama`
- **Descripción**: Tipo o modelo de cama ofrecido (por ejemplo: cama matrimonial, litera, sofá cama, etc.).
- **Tipo de dato**: Texto.

## 10. `comodidades`
- **Descripción**: Lista de comodidades o servicios que ofrece la propiedad (por ejemplo: wifi, aire acondicionado, piscina, etc.).
- **Tipo de dato**: Lista o texto separado por comas.

## 11. `cuota_deposito`
- **Descripción**: Tarifa mínima de depósito requerida como garantía para la reserva del alojamiento.
- **Tipo de dato**: Numérico.

## 12. `cuota_limpieza`
- **Descripción**: Cargo adicional cobrado por el servicio de limpieza final.
- **Tipo de dato**: Numérico.

## 13. `precio`
- **Descripción**: Precio base diario por la estancia en la propiedad.
- **Tipo de dato**: Numérico.

##Entendiendo el Problema

In [1]:
from google.colab import drive
drive.mount('/content/drive')

import pandas as pd

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [2]:
ruta = "/content/drive/MyDrive/OracleOne/datos_hosting.json"

datos_hosting = pd.read_json(ruta) #se importa como la nalga... hay que traerse los diccionarios
display(datos_hosting.head())

Unnamed: 0,info_inmuebles
0,"{'evaluacion_general': '10.0', 'experiencia_lo..."
1,"{'evaluacion_general': '10.0', 'experiencia_lo..."
2,"{'evaluacion_general': '10.0', 'experiencia_lo..."
3,"{'evaluacion_general': '10.0', 'experiencia_lo..."
4,"{'evaluacion_general': '10.0', 'experiencia_lo..."


In [3]:
datos_hosting_normalizados = pd.json_normalize(datos_hosting['info_inmuebles'])
display(datos_hosting_normalizados.head())

Unnamed: 0,evaluacion_general,experiencia_local,max_hospedes,descripcion_local,descripcion_vecindad,cantidad_baños,cantidad_cuartos,cantidad_camas,modelo_cama,comodidades,cuota_deposito,cuota_limpieza,precio
0,10.0,--,1,[This clean and comfortable one bedroom sits r...,[Lower Queen Anne is near the Seattle Center (...,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[Real Bed, Futon, Futon, Pull-out Sofa, Real B...","[{Internet,""Wireless Internet"",Kitchen,""Free P...","[$0, $0, $0, $0, $0, $350.00, $350.00, $350.00...","[$0, $0, $0, $20.00, $15.00, $28.00, $35.00, $...","[$110.00, $45.00, $55.00, $52.00, $85.00, $50...."
1,10.0,--,10,[Welcome to the heart of the 'Ballard Brewery ...,"[--, Capital Hill is the heart of Seattle, bor...","[2, 3, 2, 3, 3, 3, 2, 1, 2, 2, 2]","[3, 4, 2, 3, 3, 3, 3, 3, 3, 4, 3]","[5, 6, 8, 3, 3, 5, 4, 5, 6, 7, 4]","[Real Bed, Real Bed, Real Bed, Real Bed, Real ...","[{TV,Internet,""Wireless Internet"",Kitchen,""Fre...","[$500.00, $300.00, $0, $300.00, $300.00, $360....","[$125.00, $100.00, $85.00, $110.00, $110.00, $...","[$350.00, $300.00, $425.00, $300.00, $285.00, ..."
2,10.0,--,11,[New modern house built in 2013. Spectacular ...,[Upper Queen Anne is a charming neighborhood f...,[4],[5],[7],[Real Bed],"[{TV,""Cable TV"",Internet,""Wireless Internet"",""...","[$1,000.00]",[$300.00],[$975.00]
3,10.0,--,12,[Our NW style home is 3200+ sq ft with 3 level...,[The Views from our top floor! Wallingford ha...,"[3, 3, 3, 3, 3, 3, 3, 3]","[6, 6, 5, 5, 5, 5, 4, 4]","[6, 6, 7, 8, 7, 7, 6, 6]","[Real Bed, Real Bed, Real Bed, Real Bed, Real ...","[{Internet,""Wireless Internet"",Kitchen,""Free P...","[$500.00, $500.00, $500.00, $500.00, $500.00, ...","[$225.00, $300.00, $250.00, $250.00, $250.00, ...","[$490.00, $550.00, $350.00, $350.00, $350.00, ..."
4,10.0,--,14,"[Perfect for groups. 2 bedrooms, full bathroom...",[Safeway grocery store within walking distance...,"[2, 3]","[2, 6]","[3, 9]","[Real Bed, Real Bed]","[{TV,Internet,""Wireless Internet"",Kitchen,""Fre...","[$300.00, $2,000.00]","[$40.00, $150.00]","[$200.00, $545.00]"


In [4]:
datos_hosting_normalizados.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 70 entries, 0 to 69
Data columns (total 13 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   evaluacion_general    70 non-null     object
 1   experiencia_local     70 non-null     object
 2   max_hospedes          70 non-null     object
 3   descripcion_local     70 non-null     object
 4   descripcion_vecindad  70 non-null     object
 5   cantidad_baños        70 non-null     object
 6   cantidad_cuartos      70 non-null     object
 7   cantidad_camas        70 non-null     object
 8   modelo_cama           70 non-null     object
 9   comodidades           70 non-null     object
 10  cuota_deposito        70 non-null     object
 11  cuota_limpieza        70 non-null     object
 12  precio                70 non-null     object
dtypes: object(13)
memory usage: 7.2+ KB


##Datos numéricos

In [5]:
columnas = list(datos_hosting_normalizados.columns)
columnas

['evaluacion_general',
 'experiencia_local',
 'max_hospedes',
 'descripcion_local',
 'descripcion_vecindad',
 'cantidad_baños',
 'cantidad_cuartos',
 'cantidad_camas',
 'modelo_cama',
 'comodidades',
 'cuota_deposito',
 'cuota_limpieza',
 'precio']

In [6]:
datos_explode = datos_hosting_normalizados.explode(columnas[3:]) #para extraer los datos de las listas
display(datos_explode)

Unnamed: 0,evaluacion_general,experiencia_local,max_hospedes,descripcion_local,descripcion_vecindad,cantidad_baños,cantidad_cuartos,cantidad_camas,modelo_cama,comodidades,cuota_deposito,cuota_limpieza,precio
0,10.0,--,1,This clean and comfortable one bedroom sits ri...,Lower Queen Anne is near the Seattle Center (s...,1,1,1,Real Bed,"{Internet,""Wireless Internet"",Kitchen,""Free Pa...",$0,$0,$110.00
0,10.0,--,1,Our century old Upper Queen Anne house is loca...,"Upper Queen Anne is a really pleasant, unique ...",1,1,1,Futon,"{TV,Internet,""Wireless Internet"",Kitchen,""Free...",$0,$0,$45.00
0,10.0,--,1,Cozy room in two-bedroom apartment along the l...,The convenience of being in Seattle but on the...,1,1,1,Futon,"{TV,Internet,""Wireless Internet"",Kitchen,""Free...",$0,$0,$55.00
0,10.0,--,1,Very lovely and cozy room for one. Convenientl...,"Ballard is lovely, vibrant and one of the most...",1,1,1,Pull-out Sofa,"{Internet,""Wireless Internet"",Kitchen,""Free Pa...",$0,$20.00,$52.00
0,10.0,--,1,The “Studio at Mibbett Hollow' is in a Beautif...,--,1,1,1,Real Bed,"{""Wireless Internet"",Kitchen,""Free Parking on ...",$0,$15.00,$85.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...
68,,--,8,Beautiful craftsman home in the historic Wedgw...,--,3,4,5,Real Bed,"{TV,""Cable TV"",Internet,""Wireless Internet"",""A...","$1,000.00",$178.00,$299.00
68,,--,8,Located in a very easily accessible area of Se...,"Quiet, dead end street near I-5. The proximity...",2,4,4,Real Bed,"{TV,""Cable TV"",Internet,""Wireless Internet"",Ki...",$0,$99.00,$199.00
68,,--,8,This home is fully furnished and available wee...,--,1,3,4,Real Bed,"{TV,""Cable TV"",Internet,""Wireless Internet"",""A...",$0,$0,$400.00
69,,--,9,This business-themed modern home features: *H...,Your hosts made Madison Valley their home when...,2,3,6,Real Bed,"{TV,""Cable TV"",Internet,""Wireless Internet"",""A...","$1,000.00",$150.00,$250.00


In [7]:
datos_explode.reset_index(drop=True, inplace=True) #resetear los índices
display(datos_explode)

Unnamed: 0,evaluacion_general,experiencia_local,max_hospedes,descripcion_local,descripcion_vecindad,cantidad_baños,cantidad_cuartos,cantidad_camas,modelo_cama,comodidades,cuota_deposito,cuota_limpieza,precio
0,10.0,--,1,This clean and comfortable one bedroom sits ri...,Lower Queen Anne is near the Seattle Center (s...,1,1,1,Real Bed,"{Internet,""Wireless Internet"",Kitchen,""Free Pa...",$0,$0,$110.00
1,10.0,--,1,Our century old Upper Queen Anne house is loca...,"Upper Queen Anne is a really pleasant, unique ...",1,1,1,Futon,"{TV,Internet,""Wireless Internet"",Kitchen,""Free...",$0,$0,$45.00
2,10.0,--,1,Cozy room in two-bedroom apartment along the l...,The convenience of being in Seattle but on the...,1,1,1,Futon,"{TV,Internet,""Wireless Internet"",Kitchen,""Free...",$0,$0,$55.00
3,10.0,--,1,Very lovely and cozy room for one. Convenientl...,"Ballard is lovely, vibrant and one of the most...",1,1,1,Pull-out Sofa,"{Internet,""Wireless Internet"",Kitchen,""Free Pa...",$0,$20.00,$52.00
4,10.0,--,1,The “Studio at Mibbett Hollow' is in a Beautif...,--,1,1,1,Real Bed,"{""Wireless Internet"",Kitchen,""Free Parking on ...",$0,$15.00,$85.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...
3813,,--,8,Beautiful craftsman home in the historic Wedgw...,--,3,4,5,Real Bed,"{TV,""Cable TV"",Internet,""Wireless Internet"",""A...","$1,000.00",$178.00,$299.00
3814,,--,8,Located in a very easily accessible area of Se...,"Quiet, dead end street near I-5. The proximity...",2,4,4,Real Bed,"{TV,""Cable TV"",Internet,""Wireless Internet"",Ki...",$0,$99.00,$199.00
3815,,--,8,This home is fully furnished and available wee...,--,1,3,4,Real Bed,"{TV,""Cable TV"",Internet,""Wireless Internet"",""A...",$0,$0,$400.00
3816,,--,9,This business-themed modern home features: *H...,Your hosts made Madison Valley their home when...,2,3,6,Real Bed,"{TV,""Cable TV"",Internet,""Wireless Internet"",""A...","$1,000.00",$150.00,$250.00


In [8]:
datos_explode.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3818 entries, 0 to 3817
Data columns (total 13 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   evaluacion_general    3818 non-null   object
 1   experiencia_local     3818 non-null   object
 2   max_hospedes          3818 non-null   object
 3   descripcion_local     3818 non-null   object
 4   descripcion_vecindad  3818 non-null   object
 5   cantidad_baños        3818 non-null   object
 6   cantidad_cuartos      3818 non-null   object
 7   cantidad_camas        3818 non-null   object
 8   modelo_cama           3818 non-null   object
 9   comodidades           3818 non-null   object
 10  cuota_deposito        3818 non-null   object
 11  cuota_limpieza        3818 non-null   object
 12  precio                3818 non-null   object
dtypes: object(13)
memory usage: 387.9+ KB


In [9]:
import numpy as np

In [10]:
#cambiar a columna a int64, solo si todos los elementos están escritos como números
datos_explode['max_hospedes'] = datos_explode['max_hospedes'].astype(np.int64)
display(datos_explode.head(3))

Unnamed: 0,evaluacion_general,experiencia_local,max_hospedes,descripcion_local,descripcion_vecindad,cantidad_baños,cantidad_cuartos,cantidad_camas,modelo_cama,comodidades,cuota_deposito,cuota_limpieza,precio
0,10.0,--,1,This clean and comfortable one bedroom sits ri...,Lower Queen Anne is near the Seattle Center (s...,1,1,1,Real Bed,"{Internet,""Wireless Internet"",Kitchen,""Free Pa...",$0,$0,$110.00
1,10.0,--,1,Our century old Upper Queen Anne house is loca...,"Upper Queen Anne is a really pleasant, unique ...",1,1,1,Futon,"{TV,Internet,""Wireless Internet"",Kitchen,""Free...",$0,$0,$45.00
2,10.0,--,1,Cozy room in two-bedroom apartment along the l...,The convenience of being in Seattle but on the...,1,1,1,Futon,"{TV,Internet,""Wireless Internet"",Kitchen,""Free...",$0,$0,$55.00


In [11]:
datos_explode.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3818 entries, 0 to 3817
Data columns (total 13 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   evaluacion_general    3818 non-null   object
 1   experiencia_local     3818 non-null   object
 2   max_hospedes          3818 non-null   int64 
 3   descripcion_local     3818 non-null   object
 4   descripcion_vecindad  3818 non-null   object
 5   cantidad_baños        3818 non-null   object
 6   cantidad_cuartos      3818 non-null   object
 7   cantidad_camas        3818 non-null   object
 8   modelo_cama           3818 non-null   object
 9   comodidades           3818 non-null   object
 10  cuota_deposito        3818 non-null   object
 11  cuota_limpieza        3818 non-null   object
 12  precio                3818 non-null   object
dtypes: int64(1), object(12)
memory usage: 387.9+ KB


In [12]:
col_numericas = ['cantidad_baños', 'cantidad_cuartos', 'cantidad_camas']

In [13]:
datos_explode[col_numericas] = datos_explode[col_numericas].astype(np.int64)
display(datos_explode.head(3))

Unnamed: 0,evaluacion_general,experiencia_local,max_hospedes,descripcion_local,descripcion_vecindad,cantidad_baños,cantidad_cuartos,cantidad_camas,modelo_cama,comodidades,cuota_deposito,cuota_limpieza,precio
0,10.0,--,1,This clean and comfortable one bedroom sits ri...,Lower Queen Anne is near the Seattle Center (s...,1,1,1,Real Bed,"{Internet,""Wireless Internet"",Kitchen,""Free Pa...",$0,$0,$110.00
1,10.0,--,1,Our century old Upper Queen Anne house is loca...,"Upper Queen Anne is a really pleasant, unique ...",1,1,1,Futon,"{TV,Internet,""Wireless Internet"",Kitchen,""Free...",$0,$0,$45.00
2,10.0,--,1,Cozy room in two-bedroom apartment along the l...,The convenience of being in Seattle but on the...,1,1,1,Futon,"{TV,Internet,""Wireless Internet"",Kitchen,""Free...",$0,$0,$55.00


In [14]:
datos_explode.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3818 entries, 0 to 3817
Data columns (total 13 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   evaluacion_general    3818 non-null   object
 1   experiencia_local     3818 non-null   object
 2   max_hospedes          3818 non-null   int64 
 3   descripcion_local     3818 non-null   object
 4   descripcion_vecindad  3818 non-null   object
 5   cantidad_baños        3818 non-null   int64 
 6   cantidad_cuartos      3818 non-null   int64 
 7   cantidad_camas        3818 non-null   int64 
 8   modelo_cama           3818 non-null   object
 9   comodidades           3818 non-null   object
 10  cuota_deposito        3818 non-null   object
 11  cuota_limpieza        3818 non-null   object
 12  precio                3818 non-null   object
dtypes: int64(4), object(9)
memory usage: 387.9+ KB


In [15]:
datos_explode['evaluacion_general'] = datos_explode['evaluacion_general'].astype(np.float64)
display(datos_explode.head(3))

Unnamed: 0,evaluacion_general,experiencia_local,max_hospedes,descripcion_local,descripcion_vecindad,cantidad_baños,cantidad_cuartos,cantidad_camas,modelo_cama,comodidades,cuota_deposito,cuota_limpieza,precio
0,10.0,--,1,This clean and comfortable one bedroom sits ri...,Lower Queen Anne is near the Seattle Center (s...,1,1,1,Real Bed,"{Internet,""Wireless Internet"",Kitchen,""Free Pa...",$0,$0,$110.00
1,10.0,--,1,Our century old Upper Queen Anne house is loca...,"Upper Queen Anne is a really pleasant, unique ...",1,1,1,Futon,"{TV,Internet,""Wireless Internet"",Kitchen,""Free...",$0,$0,$45.00
2,10.0,--,1,Cozy room in two-bedroom apartment along the l...,The convenience of being in Seattle but on the...,1,1,1,Futon,"{TV,Internet,""Wireless Internet"",Kitchen,""Free...",$0,$0,$55.00


In [16]:
datos_explode.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3818 entries, 0 to 3817
Data columns (total 13 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   evaluacion_general    3162 non-null   float64
 1   experiencia_local     3818 non-null   object 
 2   max_hospedes          3818 non-null   int64  
 3   descripcion_local     3818 non-null   object 
 4   descripcion_vecindad  3818 non-null   object 
 5   cantidad_baños        3818 non-null   int64  
 6   cantidad_cuartos      3818 non-null   int64  
 7   cantidad_camas        3818 non-null   int64  
 8   modelo_cama           3818 non-null   object 
 9   comodidades           3818 non-null   object 
 10  cuota_deposito        3818 non-null   object 
 11  cuota_limpieza        3818 non-null   object 
 12  precio                3818 non-null   object 
dtypes: float64(1), int64(4), object(8)
memory usage: 387.9+ KB


In [17]:
datos_explode['precio'].fillna('0.0', inplace = True) #susttituir elementos vacíos por 0.0

datos_explode['precio'] = datos_explode['precio'].apply(lambda x:                             #apply() funciona con series
                                                        x.replace('$', '').replace(',', ''). #reemplaza
                                                        strip()).astype(np.float64)          #quita espacios vacíos, convierte a float

display(datos_explode.head(3))

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  datos_explode['precio'].fillna('0.0', inplace = True)


Unnamed: 0,evaluacion_general,experiencia_local,max_hospedes,descripcion_local,descripcion_vecindad,cantidad_baños,cantidad_cuartos,cantidad_camas,modelo_cama,comodidades,cuota_deposito,cuota_limpieza,precio
0,10.0,--,1,This clean and comfortable one bedroom sits ri...,Lower Queen Anne is near the Seattle Center (s...,1,1,1,Real Bed,"{Internet,""Wireless Internet"",Kitchen,""Free Pa...",$0,$0,110.0
1,10.0,--,1,Our century old Upper Queen Anne house is loca...,"Upper Queen Anne is a really pleasant, unique ...",1,1,1,Futon,"{TV,Internet,""Wireless Internet"",Kitchen,""Free...",$0,$0,45.0
2,10.0,--,1,Cozy room in two-bedroom apartment along the l...,The convenience of being in Seattle but on the...,1,1,1,Futon,"{TV,Internet,""Wireless Internet"",Kitchen,""Free...",$0,$0,55.0


In [18]:
col_flotantes = ['cuota_deposito', 'cuota_limpieza']

datos_explode[col_flotantes] = datos_explode[col_flotantes].applymap(lambda x: str(x).replace('$', '').replace(',', '').strip()).astype(np.float64)

display(datos_explode.head(3))

  datos_explode[col_flotantes] = datos_explode[col_flotantes].applymap(lambda x: str(x).replace('$', '').replace(',', '').strip()).astype(np.float64)


Unnamed: 0,evaluacion_general,experiencia_local,max_hospedes,descripcion_local,descripcion_vecindad,cantidad_baños,cantidad_cuartos,cantidad_camas,modelo_cama,comodidades,cuota_deposito,cuota_limpieza,precio
0,10.0,--,1,This clean and comfortable one bedroom sits ri...,Lower Queen Anne is near the Seattle Center (s...,1,1,1,Real Bed,"{Internet,""Wireless Internet"",Kitchen,""Free Pa...",0.0,0.0,110.0
1,10.0,--,1,Our century old Upper Queen Anne house is loca...,"Upper Queen Anne is a really pleasant, unique ...",1,1,1,Futon,"{TV,Internet,""Wireless Internet"",Kitchen,""Free...",0.0,0.0,45.0
2,10.0,--,1,Cozy room in two-bedroom apartment along the l...,The convenience of being in Seattle but on the...,1,1,1,Futon,"{TV,Internet,""Wireless Internet"",Kitchen,""Free...",0.0,0.0,55.0


In [19]:
datos_explode.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3818 entries, 0 to 3817
Data columns (total 13 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   evaluacion_general    3162 non-null   float64
 1   experiencia_local     3818 non-null   object 
 2   max_hospedes          3818 non-null   int64  
 3   descripcion_local     3818 non-null   object 
 4   descripcion_vecindad  3818 non-null   object 
 5   cantidad_baños        3818 non-null   int64  
 6   cantidad_cuartos      3818 non-null   int64  
 7   cantidad_camas        3818 non-null   int64  
 8   modelo_cama           3818 non-null   object 
 9   comodidades           3818 non-null   object 
 10  cuota_deposito        3818 non-null   float64
 11  cuota_limpieza        3818 non-null   float64
 12  precio                3818 non-null   float64
dtypes: float64(4), int64(4), object(5)
memory usage: 387.9+ KB


In [20]:
display(datos_explode.sample(5))

Unnamed: 0,evaluacion_general,experiencia_local,max_hospedes,descripcion_local,descripcion_vecindad,cantidad_baños,cantidad_cuartos,cantidad_camas,modelo_cama,comodidades,cuota_deposito,cuota_limpieza,precio
1311,10.0,--,4,The old Rainier brewery is a Seattle landmark ...,"Within a two-minute drive, or 10 minute bike r...",1,2,3,Real Bed,"{Internet,""Wireless Internet"",""Wheelchair Acce...",0.0,0.0,320.0
1331,10.0,--,4,"Beautifully restored, open-concept home, locat...","This is a fantastic, and pretty darn quiet, ne...",2,3,2,Real Bed,"{""Wireless Internet"",Kitchen,""Free Parking on ...",0.0,45.0,65.0
3459,,--,2,Right next to the space needle and just a few ...,--,1,0,1,Real Bed,"{TV,Internet,""Wireless Internet"",""Air Conditio...",0.0,0.0,100.0
1273,10.0,--,4,"A modern 1b1b apartment, in the heart of Seatt...","This is the heart of Seattle, the cross-sectio...",1,1,3,Real Bed,"{TV,Internet,""Wireless Internet"",""Air Conditio...",200.0,40.0,168.0
3028,9.0,--,6,The Shelby is centrally located at the interse...,In the heart of Downtown Seattle! Nearby Attra...,2,2,3,Real Bed,"{TV,""Cable TV"",Internet,""Wireless Internet"",Ki...",0.0,99.0,138.0


##Datos de texto

In [21]:
datos_explode.head(3)

Unnamed: 0,evaluacion_general,experiencia_local,max_hospedes,descripcion_local,descripcion_vecindad,cantidad_baños,cantidad_cuartos,cantidad_camas,modelo_cama,comodidades,cuota_deposito,cuota_limpieza,precio
0,10.0,--,1,This clean and comfortable one bedroom sits ri...,Lower Queen Anne is near the Seattle Center (s...,1,1,1,Real Bed,"{Internet,""Wireless Internet"",Kitchen,""Free Pa...",0.0,0.0,110.0
1,10.0,--,1,Our century old Upper Queen Anne house is loca...,"Upper Queen Anne is a really pleasant, unique ...",1,1,1,Futon,"{TV,Internet,""Wireless Internet"",Kitchen,""Free...",0.0,0.0,45.0
2,10.0,--,1,Cozy room in two-bedroom apartment along the l...,The convenience of being in Seattle but on the...,1,1,1,Futon,"{TV,Internet,""Wireless Internet"",Kitchen,""Free...",0.0,0.0,55.0


In [22]:
datos_texto = datos_explode.copy()
datos_texto['descripcion_local'] = datos_texto['descripcion_local'].str.lower() #para tokenizar, primero convertir a minúsculas
display(datos_texto.head(3))

Unnamed: 0,evaluacion_general,experiencia_local,max_hospedes,descripcion_local,descripcion_vecindad,cantidad_baños,cantidad_cuartos,cantidad_camas,modelo_cama,comodidades,cuota_deposito,cuota_limpieza,precio
0,10.0,--,1,this clean and comfortable one bedroom sits ri...,Lower Queen Anne is near the Seattle Center (s...,1,1,1,Real Bed,"{Internet,""Wireless Internet"",Kitchen,""Free Pa...",0.0,0.0,110.0
1,10.0,--,1,our century old upper queen anne house is loca...,"Upper Queen Anne is a really pleasant, unique ...",1,1,1,Futon,"{TV,Internet,""Wireless Internet"",Kitchen,""Free...",0.0,0.0,45.0
2,10.0,--,1,cozy room in two-bedroom apartment along the l...,The convenience of being in Seattle but on the...,1,1,1,Futon,"{TV,Internet,""Wireless Internet"",Kitchen,""Free...",0.0,0.0,55.0


In [23]:
datos_texto['descripcion_local'] = datos_texto['descripcion_local'].str.replace('[^a-zA-Z0-9\-\']', ' ', regex=True) #quitar todo menos la expresión regular
display(datos_texto['descripcion_local'].head(3))

Unnamed: 0,descripcion_local
0,this clean and comfortable one bedroom sits ri...
1,our century old upper queen anne house is loca...
2,cozy room in two-bedroom apartment along the l...


In [24]:
datos_texto['descripcion_local'] = datos_texto['descripcion_local'].str.replace('(?<!\w)-(?!\w)', ' ', regex=True) #quitar guion de pal-pal
display(datos_texto['descripcion_local'].head(3))

Unnamed: 0,descripcion_local
0,this clean and comfortable one bedroom sits ri...
1,our century old upper queen anne house is loca...
2,cozy room in two-bedroom apartment along the l...


In [25]:
datos_texto['descripcion_local'] = datos_texto['descripcion_local'].str.split() #tokenizar, entrega una lista con las palabras deparadas por coma
display(datos_texto.head(3))

Unnamed: 0,evaluacion_general,experiencia_local,max_hospedes,descripcion_local,descripcion_vecindad,cantidad_baños,cantidad_cuartos,cantidad_camas,modelo_cama,comodidades,cuota_deposito,cuota_limpieza,precio
0,10.0,--,1,"[this, clean, and, comfortable, one, bedroom, ...",Lower Queen Anne is near the Seattle Center (s...,1,1,1,Real Bed,"{Internet,""Wireless Internet"",Kitchen,""Free Pa...",0.0,0.0,110.0
1,10.0,--,1,"[our, century, old, upper, queen, anne, house,...","Upper Queen Anne is a really pleasant, unique ...",1,1,1,Futon,"{TV,Internet,""Wireless Internet"",Kitchen,""Free...",0.0,0.0,45.0
2,10.0,--,1,"[cozy, room, in, two-bedroom, apartment, along...",The convenience of being in Seattle but on the...,1,1,1,Futon,"{TV,Internet,""Wireless Internet"",Kitchen,""Free...",0.0,0.0,55.0


In [26]:
datos_texto['comodidades'] = datos_texto['comodidades'].str.replace('\{|}|\"', '', regex=True)
display(datos_texto['comodidades'].head(3))

Unnamed: 0,comodidades
0,"Internet,Wireless Internet,Kitchen,Free Parkin..."
1,"TV,Internet,Wireless Internet,Kitchen,Free Par..."
2,"TV,Internet,Wireless Internet,Kitchen,Free Par..."


In [27]:
datos_texto['comodidades'] = datos_texto['comodidades'].str.split(',') #lista de palabras quitando las ,
display(datos_texto['comodidades'].head(3))

Unnamed: 0,comodidades
0,"[Internet, Wireless Internet, Kitchen, Free Pa..."
1,"[TV, Internet, Wireless Internet, Kitchen, Fre..."
2,"[TV, Internet, Wireless Internet, Kitchen, Fre..."


In [28]:
# Transformamos el texto en letras minúsculas
datos_texto['descripcion_vecindad'] = datos_texto['descripcion_vecindad'].str.lower()
# Substituímos los caracteres especiales
datos_texto['descripcion_vecindad'] = datos_texto['descripcion_vecindad'].str.replace('[^a-zA-Z0-9\-\']', ' ', regex=True)
datos_texto['descripcion_vecindad'] = datos_texto['descripcion_vecindad'].str.replace('(?<!\w)-(?!\w)', '', regex=True)
# Transformamos el texto en lista, formando el token
datos_texto['descripcion_vecindad'] = datos_texto['descripcion_vecindad'].str.split()
display(datos_texto.head())

Unnamed: 0,evaluacion_general,experiencia_local,max_hospedes,descripcion_local,descripcion_vecindad,cantidad_baños,cantidad_cuartos,cantidad_camas,modelo_cama,comodidades,cuota_deposito,cuota_limpieza,precio
0,10.0,--,1,"[this, clean, and, comfortable, one, bedroom, ...","[lower, queen, anne, is, near, the, seattle, c...",1,1,1,Real Bed,"[Internet, Wireless Internet, Kitchen, Free Pa...",0.0,0.0,110.0
1,10.0,--,1,"[our, century, old, upper, queen, anne, house,...","[upper, queen, anne, is, a, really, pleasant, ...",1,1,1,Futon,"[TV, Internet, Wireless Internet, Kitchen, Fre...",0.0,0.0,45.0
2,10.0,--,1,"[cozy, room, in, two-bedroom, apartment, along...","[the, convenience, of, being, in, seattle, but...",1,1,1,Futon,"[TV, Internet, Wireless Internet, Kitchen, Fre...",0.0,0.0,55.0
3,10.0,--,1,"[very, lovely, and, cozy, room, for, one, conv...","[ballard, is, lovely, vibrant, and, one, of, t...",1,1,1,Pull-out Sofa,"[Internet, Wireless Internet, Kitchen, Free Pa...",0.0,20.0,52.0
4,10.0,--,1,"[the, studio, at, mibbett, hollow', is, in, a,...",[],1,1,1,Real Bed,"[Wireless Internet, Kitchen, Free Parking on P...",0.0,15.0,85.0


##Datos de tiempo

In [29]:
import datetime

# creando un objeto datetime con la fecha y hora actual
ahora = datetime.datetime.now()

print("Fecha y hora actual:", ahora)

Fecha y hora actual: 2025-07-13 13:25:15.102347


In [30]:
# creando un objeto date con la fecha de hoy
hoy = datetime.date.today()

print("Fecha de hoy:", hoy)

Fecha de hoy: 2025-07-13


In [31]:
# creando dos objetos date con fechas diferentes
data_1 = datetime.date(1981, 10, 31)
data_2 = datetime.date.today()

# calculando la diferencia entre las dos fechas
diferencia = data_2 - data_1

print("Diferencia entre las dos fechas:", diferencia)

Diferencia entre las dos fechas: 15961 days, 0:00:00


In [32]:
ruta = "/content/drive/MyDrive/OracleOne/inmuebles_disponibles.json"

inmuebles_disponibles = pd.read_json(ruta)
display(inmuebles_disponibles.head())

Unnamed: 0,id,fecha,lugar_disponible,precio
0,857,2016-01-04,False,
1,857,2016-01-05,False,
2,857,2016-01-06,False,
3,857,2016-01-07,False,
4,857,2016-01-08,False,


In [33]:
inmuebles_disponibles.info()

<class 'pandas.core.frame.DataFrame'>
Index: 365000 entries, 0 to 364999
Data columns (total 4 columns):
 #   Column            Non-Null Count   Dtype 
---  ------            --------------   ----- 
 0   id                365000 non-null  int64 
 1   fecha             365000 non-null  object
 2   lugar_disponible  365000 non-null  bool  
 3   precio            270547 non-null  object
dtypes: bool(1), int64(1), object(2)
memory usage: 11.5+ MB


In [34]:
inmuebles_disponibles['fecha'] = pd.to_datetime(inmuebles_disponibles['fecha'])
display(inmuebles_disponibles.sample(3))
type(inmuebles_disponibles['fecha'])

Unnamed: 0,id,fecha,lugar_disponible,precio
63131,849,2016-12-20,False,
235460,3403,2016-02-08,True,$87.00
195929,582,2016-10-19,True,$79.00


In [35]:
inmuebles_disponibles.info()

<class 'pandas.core.frame.DataFrame'>
Index: 365000 entries, 0 to 364999
Data columns (total 4 columns):
 #   Column            Non-Null Count   Dtype         
---  ------            --------------   -----         
 0   id                365000 non-null  int64         
 1   fecha             365000 non-null  datetime64[ns]
 2   lugar_disponible  365000 non-null  bool          
 3   precio            270547 non-null  object        
dtypes: bool(1), datetime64[ns](1), int64(1), object(1)
memory usage: 11.5+ MB


In [36]:
inmuebles_disponibles['fecha'].dt.strftime('%Y-%m')
display(inmuebles_disponibles.head())
inmuebles_disponibles.info()

Unnamed: 0,id,fecha,lugar_disponible,precio
0,857,2016-01-04,False,
1,857,2016-01-05,False,
2,857,2016-01-06,False,
3,857,2016-01-07,False,
4,857,2016-01-08,False,


<class 'pandas.core.frame.DataFrame'>
Index: 365000 entries, 0 to 364999
Data columns (total 4 columns):
 #   Column            Non-Null Count   Dtype         
---  ------            --------------   -----         
 0   id                365000 non-null  int64         
 1   fecha             365000 non-null  datetime64[ns]
 2   lugar_disponible  365000 non-null  bool          
 3   precio            270547 non-null  object        
dtypes: bool(1), datetime64[ns](1), int64(1), object(1)
memory usage: 11.5+ MB


In [37]:
subset = inmuebles_disponibles.groupby(inmuebles_disponibles['fecha'].dt.strftime('%Y-%m'))['lugar_disponible'].sum()
display(subset)

Unnamed: 0_level_0,lugar_disponible
fecha,Unnamed: 1_level_1
2016-01,16543
2016-02,20128
2016-03,23357
2016-04,22597
2016-05,23842
2016-06,23651
2016-07,22329
2016-08,22529
2016-09,22471
2016-10,23765


#Práctica

In [38]:
# Import de pandas
import pandas as pd
# Leer el archivo json con read_json
url = "/content/drive/MyDrive/OracleOne/dados_vendas_clientes.json"
datos_ventas = pd.read_json(url)
# Aplicar json_normalize en la columna dados_vendas
datos_ventas = pd.json_normalize(datos_ventas['dados_vendas'])
# Mostrar valores
display(datos_ventas)

Unnamed: 0,Data de venda,Cliente,Valor da compra
0,06/06/2022,"[@ANA _LUCIA 321, DieGO ARMANDIU 210, DieGO AR...","[R$ 836,5, R$ 573,33, R$ 392,8, R$ 512,34]"
1,07/06/2022,"[Isabely JOanes 738, Isabely JOanes 738, Isabe...","[R$ 825,31, R$ 168,07, R$ 339,18, R$ 314,69]"
2,08/06/2022,"[Isabely JOanes 738, JOãO Gabriel 671, Julya m...","[R$ 682,05, R$ 386,34, R$ 622,65, R$ 630,79]"
3,09/06/2022,"[Julya meireles 914, MaRIA Julia 444, MaRIA Ju...","[R$ 390,3, R$ 759,16, R$ 334,47, R$ 678,78]"
4,10/06/2022,"[MaRIA Julia 444, PEDRO PASCO 812, Paulo castr...","[R$ 314,24, R$ 311,15, R$ 899,16, R$ 885,24]"


In [39]:

# Leer el archivo json con read_json
datos_locacion = pd.read_json('/content/drive/MyDrive/OracleOne/dados_locacao_imoveis.json')
# Aplicar json_normalize en la columna dados_locacao
datos_locacion = pd.json_normalize(datos_locacion['dados_locacao'])
# Mostrar valores
display(datos_locacion)

Unnamed: 0,apartamento,datas_combinadas_pagamento,datas_de_pagamento,valor_aluguel
0,A101 (blocoAP),"[01/06/2022, 01/07/2022]","[05/06/2022, 03/07/2022]","[$ 1000,0 reais, $ 2500,0 reais]"
1,A102 (blocoAP),"[02/06/2022, 02/07/2022]","[02/06/2022, 06/07/2022]","[$ 1100,0 reais, $ 2600,0 reais]"
2,B201 (blocoAP),"[03/06/2022, 03/07/2022]","[07/06/2022, 03/07/2022]","[$ 1200,0 reais, $ 2700,0 reais]"
3,B202 (blocoAP),"[04/06/2022, 04/07/2022]","[07/06/2022, 05/07/2022]","[$ 1300,0 reais, $ 2800,0 reais]"
4,C301 (blocoAP),"[05/06/2022, 05/07/2022]","[10/06/2022, 09/07/2022]","[$ 1400,0 reais, $ 2900,0 reais]"
5,C302 (blocoAP),"[06/06/2022, 06/07/2022]","[08/06/2022, 12/07/2022]","[$ 1500,0 reais, $ 1200,0 reais]"
6,D401 (blocoAP),"[07/06/2022, 07/07/2022]","[07/06/2022, 09/07/2022]","[$ 1600,0 reais, $ 1300,0 reais]"
7,D402 (blocoAP),"[08/06/2022, 08/07/2022]","[10/06/2022, 14/07/2022]","[$ 1700,0 reais, $ 1400,0 reais]"
8,E501 (blocoAP),"[09/06/2022, 09/07/2022]","[10/06/2022, 09/07/2022]","[$ 1800,0 reais, $ 1500,0 reais]"
9,E502 (blocoAP),"[10/06/2022, 10/07/2022]","[16/06/2022, 12/07/2022]","[$ 1900,0 reais, $ 1600,0 reais]"


In [40]:
# Colectar los valores de las columnas y verificar
columnas = list(datos_ventas.columns)
columnas

# Destrincar las listas con explode
datos_ventas = datos_ventas.explode(columnas[1:])
# Resetear los index de las líneas
datos_ventas.reset_index(drop=True,inplace=True)
# Observar el DataFrame
display(datos_ventas)

# Verificar los tipos de datos con info
datos_ventas.info()



Unnamed: 0,Data de venda,Cliente,Valor da compra
0,06/06/2022,@ANA _LUCIA 321,"R$ 836,5"
1,06/06/2022,DieGO ARMANDIU 210,"R$ 573,33"
2,06/06/2022,DieGO ARMANDIU 210,"R$ 392,8"
3,06/06/2022,DieGO ARMANDIU 210,"R$ 512,34"
4,07/06/2022,Isabely JOanes 738,"R$ 825,31"
5,07/06/2022,Isabely JOanes 738,"R$ 168,07"
6,07/06/2022,Isabely JOanes 738,"R$ 339,18"
7,07/06/2022,Isabely JOanes 738,"R$ 314,69"
8,08/06/2022,Isabely JOanes 738,"R$ 682,05"
9,08/06/2022,JOãO Gabriel 671,"R$ 386,34"


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 3 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Data de venda    20 non-null     object
 1   Cliente          20 non-null     object
 2   Valor da compra  20 non-null     object
dtypes: object(3)
memory usage: 612.0+ bytes


In [41]:
# La columna numérica es el 'Valor da compra'
datos_ventas['Valor da compra']

# Iniciar la transformación
# Import de la biblioteca numpy
import numpy as np
# Remover los textos presentes en la base
# Cambiar las comas separadoras del decimal por punto
datos_ventas['Valor da compra'] = datos_ventas['Valor da compra'].apply(lambda x: x.replace('R$ ', '').replace(',','.').strip())
# Cambiar los tipo de datos para float
datos_ventas['Valor da compra'] = datos_ventas['Valor da compra'].astype(np.float64)
# Verificar la transformación
datos_ventas.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 3 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Data de venda    20 non-null     object 
 1   Cliente          20 non-null     object 
 2   Valor da compra  20 non-null     float64
dtypes: float64(1), object(2)
memory usage: 612.0+ bytes


In [42]:
# Colectar los valores de las columnas y verificar
columnas = list(datos_locacion.columns)
columnas

# Destrincar las listas con explode
datos_locacion = datos_locacion.explode(columnas[1:])
# Resetear los index de las líneas
datos_locacion.reset_index(drop=True,inplace=True)
# Observar el DataFrame
display(datos_locacion)

# Verificar los tipos de datos con info
datos_locacion.info()

# La columna numérica es el 'valor_aluguel'
datos_locacion['valor_aluguel']

# Iniciar la transformación
# Import de la biblioteca numpy
import numpy as np
# Remover los textos presentes en la base
# Cambiar las comas separadoras del decimal por punto
datos_locacion['valor_aluguel'] = datos_locacion['valor_aluguel'].apply(lambda x: x.replace('$ ', '').replace(' reais', '').replace(',','.').strip())
# Cambiar los tipos de datos para float
datos_locacion['valor_aluguel'] = datos_locacion['valor_aluguel'].astype(np.float64)
# Verificar la transformación
datos_locacion.info()

Unnamed: 0,apartamento,datas_combinadas_pagamento,datas_de_pagamento,valor_aluguel
0,A101 (blocoAP),01/06/2022,05/06/2022,"$ 1000,0 reais"
1,A101 (blocoAP),01/07/2022,03/07/2022,"$ 2500,0 reais"
2,A102 (blocoAP),02/06/2022,02/06/2022,"$ 1100,0 reais"
3,A102 (blocoAP),02/07/2022,06/07/2022,"$ 2600,0 reais"
4,B201 (blocoAP),03/06/2022,07/06/2022,"$ 1200,0 reais"
5,B201 (blocoAP),03/07/2022,03/07/2022,"$ 2700,0 reais"
6,B202 (blocoAP),04/06/2022,07/06/2022,"$ 1300,0 reais"
7,B202 (blocoAP),04/07/2022,05/07/2022,"$ 2800,0 reais"
8,C301 (blocoAP),05/06/2022,10/06/2022,"$ 1400,0 reais"
9,C301 (blocoAP),05/07/2022,09/07/2022,"$ 2900,0 reais"


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 4 columns):
 #   Column                      Non-Null Count  Dtype 
---  ------                      --------------  ----- 
 0   apartamento                 30 non-null     object
 1   datas_combinadas_pagamento  30 non-null     object
 2   datas_de_pagamento          30 non-null     object
 3   valor_aluguel               30 non-null     object
dtypes: object(4)
memory usage: 1.1+ KB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 4 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   apartamento                 30 non-null     object 
 1   datas_combinadas_pagamento  30 non-null     object 
 2   datas_de_pagamento          30 non-null     object 
 3   valor_aluguel               30 non-null     float64
dtypes: float64(1), object(3)
memory usage: 1.1+ KB


In [43]:
display(datos_locacion)

Unnamed: 0,apartamento,datas_combinadas_pagamento,datas_de_pagamento,valor_aluguel
0,A101 (blocoAP),01/06/2022,05/06/2022,1000.0
1,A101 (blocoAP),01/07/2022,03/07/2022,2500.0
2,A102 (blocoAP),02/06/2022,02/06/2022,1100.0
3,A102 (blocoAP),02/07/2022,06/07/2022,2600.0
4,B201 (blocoAP),03/06/2022,07/06/2022,1200.0
5,B201 (blocoAP),03/07/2022,03/07/2022,2700.0
6,B202 (blocoAP),04/06/2022,07/06/2022,1300.0
7,B202 (blocoAP),04/07/2022,05/07/2022,2800.0
8,C301 (blocoAP),05/06/2022,10/06/2022,1400.0
9,C301 (blocoAP),05/07/2022,09/07/2022,2900.0


In [44]:
# Transformar los textos de Cliente para texto en minúscula
datos_ventas['Cliente'] = datos_ventas['Cliente'].str.lower()
# Verificar el resultado
datos_ventas.head()

# Opción de substitución - necesario verificar el resultado de la substitución
# Regex no selecciona todas las letras de a-z y espacios en blanco ' '
# Todo que satisface el regex es borrado
datos_ventas['Cliente'].str.replace('[^a-z ]', '', regex=True)

# Realizar la substitución de los datos en la columna textual
datos_ventas['Cliente'] = datos_ventas['Cliente'].str.replace('[^a-z ]', '', regex=True).str.strip()
# Visualizar el resultado final
datos_ventas.head()

Unnamed: 0,Data de venda,Cliente,Valor da compra
0,06/06/2022,ana lucia,836.5
1,06/06/2022,diego armandiu,573.33
2,06/06/2022,diego armandiu,392.8
3,06/06/2022,diego armandiu,512.34
4,07/06/2022,isabely joanes,825.31


In [45]:
# Opción de substitución - necesario verificar el resultado de la substitución
# Fue necesario adicionar la barra '\' para ser considerados los paréntesis como caracteres literales
datos_locacion['apartamento'].str.replace(' \(blocoAP\)', '', regex=True)

# Realizar la substitución de los datos en la columna textual
datos_locacion['apartamento'] = datos_locacion['apartamento'].str.replace(' \(blocoAP\)', '', regex=True)
# Visualizar el resultado final
datos_locacion

Unnamed: 0,apartamento,datas_combinadas_pagamento,datas_de_pagamento,valor_aluguel
0,A101,01/06/2022,05/06/2022,1000.0
1,A101,01/07/2022,03/07/2022,2500.0
2,A102,02/06/2022,02/06/2022,1100.0
3,A102,02/07/2022,06/07/2022,2600.0
4,B201,03/06/2022,07/06/2022,1200.0
5,B201,03/07/2022,03/07/2022,2700.0
6,B202,04/06/2022,07/06/2022,1300.0
7,B202,04/07/2022,05/07/2022,2800.0
8,C301,05/06/2022,10/06/2022,1400.0
9,C301,05/07/2022,09/07/2022,2900.0
