## PASO 0: Importar bibliotecas y leer datos

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv(r'dataset_raw\Portabilidad.csv')
df

Unnamed: 0,Año,Mes,Personal,Claro,Nextel,Movistar,Otros,Periodo
0,2012,3,-17,19,0,-2,,Mar-12
1,2012,4,3508,-1134,-189,-2185,,Apr-12
2,2012,5,1253,4792,-853,-5192,,May-12
3,2012,6,3263,-1248,-848,-1167,,Jun-12
4,2012,7,5554,-3847,-1105,-602,,Jul-12
...,...,...,...,...,...,...,...,...
123,2022,6,-3012,20311,,-17348,49.0,Jun-22
124,2022,7,2263,7363,,-9653,27.0,Jul-22
125,2022,8,996,19548,,-20585,41.0,Aug-22
126,2022,9,-2209,18050,,-15982,141.0,Sep-22


In [3]:
pd.set_option('display.max.columns', 200)
pd.set_option('display.max.rows', 200)

'''Pandas por defecto no te muestra todas las columnas del dataframe.
Es util para la exploracion de datos tener un limite bastante alto.
Correr despues de haber creado el dataframe, sino da error.
'''

'Pandas por defecto no te muestra todas las columnas del dataframe.\nEs util para la exploracion de datos tener un limite bastante alto.\nCorrer despues de haber creado el dataframe, sino da error.\n'

---

## PASO 1: Comprensión de datos

- Dataframe `info`
- Dataframe `shape`
- `head` y `tail`
- Duplicados

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 128 entries, 0 to 127
Data columns (total 8 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Año       128 non-null    int64  
 1   Mes       128 non-null    int64  
 2   Personal  128 non-null    object 
 3   Claro     128 non-null    object 
 4   Nextel    70 non-null     object 
 5   Movistar  128 non-null    object 
 6   Otros     8 non-null      float64
 7   Periodo   128 non-null    object 
dtypes: float64(1), int64(2), object(5)
memory usage: 8.1+ KB


In [5]:
df.shape

(128, 8)

In [6]:
# Verificar si hay filas completas duplicadas
duplicates = df[df.duplicated(keep=False, subset=df.columns)]
duplicates

Unnamed: 0,Año,Mes,Personal,Claro,Nextel,Movistar,Otros,Periodo


In [7]:
df.drop_duplicates(inplace=True)

In [8]:
df.columns

Index(['Año', 'Mes', 'Personal', 'Claro', 'Nextel', 'Movistar', 'Otros',
       'Periodo'],
      dtype='object')

In [9]:
df.head(5)

Unnamed: 0,Año,Mes,Personal,Claro,Nextel,Movistar,Otros,Periodo
0,2012,3,-17,19,0,-2,,Mar-12
1,2012,4,3508,-1134,-189,-2185,,Apr-12
2,2012,5,1253,4792,-853,-5192,,May-12
3,2012,6,3263,-1248,-848,-1167,,Jun-12
4,2012,7,5554,-3847,-1105,-602,,Jul-12


---

## PASO 2: Preparación de datos

- Quitar columnas y filas irrelevantes
- Renombrar columnas
- Identificar columnas y filas duplicadas


In [10]:
df = df[['Año', 'Mes', 'Personal', 'Claro', 'Nextel', 'Movistar', 'Otros']]

df.head(5)

Unnamed: 0,Año,Mes,Personal,Claro,Nextel,Movistar,Otros
0,2012,3,-17,19,0,-2,
1,2012,4,3508,-1134,-189,-2185,
2,2012,5,1253,4792,-853,-5192,
3,2012,6,3263,-1248,-848,-1167,
4,2012,7,5554,-3847,-1105,-602,


In [11]:
df.isna().sum()   # Checkeamos Nulos por columna

Año           0
Mes           0
Personal      0
Claro         0
Nextel       58
Movistar      0
Otros       120
dtype: int64

In [12]:
df.fillna(0,inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df.fillna(0,inplace=True)


In [13]:
df[['Personal', 'Claro', 'Nextel', 'Movistar']] = df[['Personal', 'Claro', 'Nextel', 'Movistar']].replace(",", "", regex=True).astype(int)


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[['Personal', 'Claro', 'Nextel', 'Movistar']] = df[['Personal', 'Claro', 'Nextel', 'Movistar']].replace(",", "", regex=True).astype(int)


In [14]:
df.dtypes

Año           int64
Mes           int64
Personal      int32
Claro         int32
Nextel        int32
Movistar      int32
Otros       float64
dtype: object

In [15]:
df.head()

Unnamed: 0,Año,Mes,Personal,Claro,Nextel,Movistar,Otros
0,2012,3,-17,19,0,-2,0.0
1,2012,4,3508,-1134,-189,-2185,0.0
2,2012,5,1253,4792,-853,-5192,0.0
3,2012,6,3263,-1248,-848,-1167,0.0
4,2012,7,5554,-3847,-1105,-602,0.0


In [16]:
# Guardar DataFrame como un archivo CSV
df.to_csv(r'..\EDA\dataset_processed\Portabilidad_prces.csv', index=False)