## Reto 1: Casting

### 1. Objetivos:
    - Aplicar diversas técnicas de casting a un dataset nuevo
 
---
    
### 2. Desarrollo:

#### a) Transformando tipos de datos

Vamos a trabajar con una versión un poco modificada del dataset que creaste en la sesión pasada. Si bien recuerdas, al final de la sesión pasada automatizamos un programa de Python para obtener un `DataFrame` con todos los objetos que orbitaron cerca de la Tierra en Enero y Febrero de 1995. Para construir este dataset, usamos el API gratuito que ofrece la [NASA](https://api.nasa.gov/).

Me tomé la libertad de modificar un poco dicho dataset para que pudiera ser utilizado más efectivamente para los fines de esta sesión. Encontrarás la versión modificada en la ruta '../../Datasets/near_earth_objects-jan_feb_1995-dirty.csv'. Todos los Retos de esta sesión los harás con ese conjunto de datos.

Te recomiendo que al finalizar cada reto guardes la nueva versión modificada de tu dataset bajo un nombre que indique el reto realizado (por ejemplo, 'near_earth_objects-jan_feb_1995-reto_1.csv'), para que puedas ir trabajando incrementalmente a través de los retos y no tengas que repetir procesos. Puedes guardar conjuntos de datos en formato `csv` usando el método `DataFrame.to_csv('ruta')`.

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
import pandas as pd
df = pd.read_csv('drive/MyDrive/BEDU/Datasets/near_earth_objects-jan_feb_1995-dirty.csv', index_col=0)

In [6]:
df.head()

Unnamed: 0,id_name,is_potentially_hazardous_asteroid,estimated_diameter.meters.estimated_diameter_min,estimated_diameter.meters.estimated_diameter_max,close_approach_date,epoch_date_close_approach,orbiting_body,relative_velocity.kilometers_per_second,relative_velocity.kilometers_per_hour,orbit_class_description
0,2154652-154652 (2004 EP20),False,483.676488,1081.533507,1995-01-07,789467580000,earth,16.142864,58114.3086669449,Near-Earth-asteroid-orbits-similar-to-that-o...
1,3153509-(2003 HM),True,96.506147,215.794305,1995-01-07,789491340000,earth,12.351044,44463.7577343496,Near-Earth-asteroid-orbits-which-cross-the-E...
2,3516633-(2010 HA),False,44.11182,98.637028,1995-01-07,789446820000,earth,6.220435,Unknown,Near-Earth-asteroid-orbits-similar-to-that-o...
3,3837644-(2019 AY3),False,46.190746,103.285648,1995-01-07,789513900000,earth,22.478615,80923.0150213416,Near-Earth-asteroid-orbits-similar-to-that-o...
4,3843493-(2019 PY),False,22.108281,49.435619,1995-01-07,789446700000,earth,4.998691,17995.2883553078,Near-Earth-asteroid-orbits-similar-to-that-of...


In [7]:
df.info

<bound method DataFrame.info of                          id_name  ...                            orbit_class_description
0     2154652-154652 (2004 EP20)  ...    Near-Earth-asteroid-orbits-similar-to-that-o...
1              3153509-(2003 HM)  ...    Near-Earth-asteroid-orbits-which-cross-the-E...
2              3516633-(2010 HA)  ...    Near-Earth-asteroid-orbits-similar-to-that-o...
3             3837644-(2019 AY3)  ...    Near-Earth-asteroid-orbits-similar-to-that-o...
4              3843493-(2019 PY)  ...   Near-Earth-asteroid-orbits-similar-to-that-of...
..                           ...  ...                                                ...
328  2267136-267136 (2000 EF104)  ...    Near-Earth-asteroid-orbits-similar-to-that-o...
329           3360486-(2006 WE4)  ...    Near-Earth-asteroid-orbits-which-cross-the-E...
330           3656919-(2014 BG3)  ...   An-asteroid-orbit-contained-entirely-within-t...
331           3803762-(2018 GY4)  ...    Near-Earth-asteroid-orbits-similar-to

In [30]:
df.dtypes

id_name                                                     object
is_potentially_hazardous_asteroid                             bool
estimated_diameter.meters.estimated_diameter_min           float64
estimated_diameter.meters.estimated_diameter_max           float64
close_approach_date                                 datetime64[ns]
epoch_date_close_approach                           datetime64[ns]
orbiting_body                                               object
relative_velocity.kilometers_per_second                    float64
relative_velocity.kilometers_per_hour                      float64
orbit_class_description                                     object
dtype: object

Tu primer Reto consistirá en seguir los siguientes pasos:

1. Lee el dataset y crea un `DataFrame` con él.
2. Realiza una pequeña exploración para familiarizarte con él.
3. Convierte la columna `relative_velocity.kilometers_per_hour` de `object` a `float64`.
4. Convierte la columna `close_approach_date` a tipo de dato `datetime64[ms]` usando el método `astype` y un diccionario de conversión.
5. Convierte la columna `epoch_date_close_approach` a tipo de dato `datetime64[ms]` usando el método `to_datetime`.
6. Asigna el `DataFrame` resultante a la variable `df_reto_1`.
7. Guarda tu resultado en un archivo .csv.

In [11]:
df['relative_velocity.kilometers_per_hour'] = pd.to_numeric(df['relative_velocity.kilometers_per_hour'], errors='coerce')

In [15]:
diccionario_de_conversion={
    "close_approach_date":"datetime64[ms]"
}

In [16]:
df = df.astype(diccionario_de_conversion)

In [29]:
df['epoch_date_close_approach'] = pd.to_datetime(df['epoch_date_close_approach'], unit='ms')

Pídele a tu experta la función de verificación `checar_conversiones` (encontrada en el archivo `helpers.py` de la carpeta donde se encuentra este Reto), pégala debajo y corre la celda para verificar tu resultado:

In [26]:
df_reto_1 = df.copy()
df_reto_1.to_csv('drive/MyDrive/BEDU/df_reto_1.csv')

In [28]:
def checar_conversiones(df_reto_1):
    
    import pandas as pd
    import pandas.api.types as ptypes
    
    assert ptypes.is_float_dtype(df_reto_1['relative_velocity.kilometers_per_hour']), 'Cuidado... La columna `relative_velocity.kilometers_per_hour` no es de tipo `float64`'
    assert ptypes.is_datetime64_any_dtype(df_reto_1['close_approach_date']), 'Cuidado... La columna `close_approach_date` no es de tipo `datetime64[ns]`'
    assert ptypes.is_datetime64_any_dtype(df_reto_1['epoch_date_close_approach']), 'Cuidado... La columna `epoch_date_close_approach` no es de tipo `datetime64[ns]'
    
    print(f'¡Éxito! ¡Todas tus conversiones fueron realizadas adecuadamente!')

checar_conversiones(df_reto_1)

¡Éxito! ¡Todas tus conversiones fueron realizadas adecuadamente!


<details><summary>Solución</summary>

```python
df = pd.read_csv('../../Datasets/near_earth_objects-jan_feb_1995-dirty.csv', index_col=0)
df['relative_velocity.kilometers_per_hour'] = pd.to_numeric(df['relative_velocity.kilometers_per_hour'], errors='coerce')
df = df.dropna(axis=0).reset_index(drop=True)
df['relative_velocity.kilometers_per_hour'] = df['relative_velocity.kilometers_per_hour'].astype(float)
diccionario_de_conversion = {
    'close_approach_date': 'datetime64[ms]'
}
df = df.astype(diccionario_de_conversion)
df['epoch_date_close_approach'] = pd.to_datetime(df['epoch_date_close_approach'], unit='ms')
df_reto_1 = df.copy()
```
    
</details>

In [41]:
df_reto_2 = pd.read_csv("drive/MyDrive/BEDU/df_reto_1.csv")

In [46]:
df_reto_2["orbit_class_description"]

0      Near Earth asteroid orbits similar to that of ...
1      Near Earth asteroid orbits which cross the Ear...
2      Near Earth asteroid orbits similar to that of ...
3      Near Earth asteroid orbits similar to that of ...
4      Near Earth asteroid orbits similar to that of ...
                             ...                        
328    Near Earth asteroid orbits similar to that of ...
329    Near Earth asteroid orbits which cross the Ear...
330    An asteroid orbit contained entirely within th...
331    Near Earth asteroid orbits similar to that of ...
332    Near Earth asteroid orbits similar to that of ...
Name: orbit_class_description, Length: 333, dtype: object

In [45]:
df_reto_2["orbit_class_description"]=df_reto_2['orbit_class_description'].str.replace('-', ' ')
df_reto_2["orbit_class_description"]=df_reto_2["orbit_class_description"].str.strip()

In [43]:
df_reto_2[['id', 'name']] = df_reto_2['id_name'].str.split('-', expand=True)

0      Near-Earth-asteroid-orbits-similar-to-that-of-...
1      Near-Earth-asteroid-orbits-which-cross-the-Ear...
2      Near-Earth-asteroid-orbits-similar-to-that-of-...
3      Near-Earth-asteroid-orbits-similar-to-that-of-...
4      Near-Earth-asteroid-orbits-similar-to-that-of-...
                             ...                        
328    Near-Earth-asteroid-orbits-similar-to-that-of-...
329    Near-Earth-asteroid-orbits-which-cross-the-Ear...
330    An-asteroid-orbit-contained-entirely-within-th...
331    Near-Earth-asteroid-orbits-similar-to-that-of-...
332    Near-Earth-asteroid-orbits-similar-to-that-of-...
Name: orbit_class_description, Length: 333, dtype: object

In [52]:
df_reto_2["orbiting_body"]= df_reto_2["orbiting_body"].str.title()

In [53]:
df_reto_2.to_csv('drive/MyDrive/BEDU/df_reto_2.csv')

In [54]:
df_reto_3 = pd.read_csv("drive/MyDrive/BEDU/df_reto_2.csv")

In [55]:
df_reto_3["is_potentially_hazardous_asteroid"]

0      False
1       True
2      False
3      False
4      False
       ...  
328    False
329    False
330    False
331    False
332    False
Name: is_potentially_hazardous_asteroid, Length: 333, dtype: bool

In [60]:
dicbol={
    False:0,
    True:1
}


In [64]:
df_reto_3['is_potentially_hazardous_asteroid']=df_reto_3['is_potentially_hazardous_asteroid'].map(dicbol)

In [65]:
df_reto_3['is_potentially_hazardous_asteroid']

0      0
1      1
2      0
3      0
4      0
      ..
328    0
329    0
330    0
331    0
332    0
Name: is_potentially_hazardous_asteroid, Length: 333, dtype: int64

In [69]:
def aminutos(val):
  return val/60

In [70]:
df_reto_3["relative_velocity.kilometers_per_hour"].map(aminutos)

0       968.571811
1       741.062629
2              NaN
3      1348.716917
4       299.921473
          ...     
328     970.823503
329     906.368411
330    1220.590392
331    1783.945551
332            NaN
Name: relative_velocity.kilometers_per_hour, Length: 333, dtype: float64

In [71]:
df_reto_3["relative_velocity.kilometers_per_minute"]=df_reto_3["relative_velocity.kilometers_per_hour"].map(aminutos)

In [74]:
df_reto_3.to_csv('drive/MyDrive/BEDU/df_reto_3.csv')

In [75]:
def revisar_resultados(df_reto_3):
    
    import pandas as np
    import pandas.api.types as pdtypes
    
    assert pdtypes.is_int64_dtype(df_reto_3['is_potentially_hazardous_asteroid']), 'La columna "is_potentially_hazardous_asteroid" no ha sido transformada a tipo numerico'
    assert len(df_reto_3['is_potentially_hazardous_asteroid'].unique()) == 2, 'Hubo un error con la correspondencia de valores booleanos a numéricos. Hay más de dos valores posibles en la columna resultante'
    assert df_reto_3['relative_velocity.kilometers_per_minute'].equals(df_reto_3['relative_velocity.kilometers_per_hour'] / 60), 'La conversión de kilometros por hora a kilómetros por minuto no fue realizada correctamente'
    
    print(f'Todos los procesos fueron realizados exitosamente!')

revisar_resultados(df_reto_3)

Todos los procesos fueron realizados exitosamente!


In [76]:
df_reto_4 = pd.read_csv("drive/MyDrive/BEDU/df_reto_3.csv")

In [78]:
def diametro(val):
  return val/12742000

In [77]:
df_reto_4

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,Unnamed: 0.1.1,id_name,is_potentially_hazardous_asteroid,estimated_diameter.meters.estimated_diameter_min,estimated_diameter.meters.estimated_diameter_max,close_approach_date,epoch_date_close_approach,orbiting_body,relative_velocity.kilometers_per_second,relative_velocity.kilometers_per_hour,orbit_class_description,relative_velocity.kilometers_per_minute
0,0,0,0,2154652-154652 (2004 EP20),0,483.676488,1081.533507,1995-01-07,1995-01-07 08:33:00,Earth,16.142864,58114.308667,Near Earth asteroid orbits similar to that of ...,968.571811
1,1,1,1,3153509-(2003 HM),1,96.506147,215.794305,1995-01-07,1995-01-07 15:09:00,Earth,12.351044,44463.757734,Near Earth asteroid orbits which cross the Ear...,741.062629
2,2,2,2,3516633-(2010 HA),0,44.111820,98.637028,1995-01-07,1995-01-07 02:47:00,Earth,6.220435,,Near Earth asteroid orbits similar to that of ...,
3,3,3,3,3837644-(2019 AY3),0,46.190746,103.285648,1995-01-07,1995-01-07 21:25:00,Earth,22.478615,80923.015021,Near Earth asteroid orbits similar to that of ...,1348.716917
4,4,4,4,3843493-(2019 PY),0,22.108281,49.435619,1995-01-07,1995-01-07 02:45:00,Earth,4.998691,17995.288355,Near Earth asteroid orbits similar to that of ...,299.921473
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
328,328,328,328,2267136-267136 (2000 EF104),0,441.118200,986.370281,1995-02-21,1995-02-21 04:17:00,Earth,16.180392,58249.410194,Near Earth asteroid orbits similar to that of ...,970.823503
329,329,329,329,3360486-(2006 WE4),0,441.118200,986.370281,1995-02-21,1995-02-21 15:44:00,Earth,15.106140,54382.104639,Near Earth asteroid orbits which cross the Ear...,906.368411
330,330,330,330,3656919-(2014 BG3),0,160.160338,358.129403,1995-02-21,1995-02-21 12:08:00,Earth,20.343173,73235.423517,An asteroid orbit contained entirely within th...,1220.590392
331,331,331,331,3803762-(2018 GY4),0,421.264611,941.976306,1995-02-21,1995-02-21 12:54:00,Earth,29.732426,107036.733058,Near Earth asteroid orbits similar to that of ...,1783.945551


In [80]:
df_reto_4["proportion_of_max_diameter_to_earth"]=df_reto_4['estimated_diameter.meters.estimated_diameter_max'].apply(diametro)

In [83]:
df_reto_4.to_csv('drive/MyDrive/BEDU/df_reto_4.csv')

In [82]:
def revisar_aplicacion(df_reto_4):
    
    assert 'proportion_of_max_diameter_to_earth' in df_reto_4, 'No existe una columna llamada "proportion_of_max_diameter_to_earth" en el DataFrame'
    assert df_reto_4['proportion_of_max_diameter_to_earth'].equals(df_reto_4['estimated_diameter.meters.estimated_diameter_max'] / 12742000), 'La transformacion no fue realizada adecuadamente'
    
    print(f'La transformación y creación de una nueva columna fue realizada exitosamente!')

revisar_aplicacion(df_reto_4)

La transformación y creación de una nueva columna fue realizada exitosamente!


In [84]:
df_reto_5 = pd.read_csv("drive/MyDrive/BEDU/df_reto_4.csv")

In [86]:
filtro_1_1= df_reto_5["is_potentially_hazardous_asteroid"] == True 
filtro_1_2=df_reto_5["is_potentially_hazardous_asteroid"]==1

In [88]:
df_hazardous = df_reto_5[filtro_1_1 | filtro_1_2]

In [91]:
df_bigger_than_1000 = df_reto_5[df_reto_5["estimated_diameter.meters.estimated_diameter_max"]>1000]

In [99]:
pd.to_datetime(df_reto_5["epoch_date_close_approach"])=='1995-02'

0      False
1      False
2      False
3      False
4      False
       ...  
328    False
329    False
330    False
331    False
332    False
Name: epoch_date_close_approach, Length: 333, dtype: bool

In [100]:
df_reto_5["epoch_date_close_approach"]

0      1995-01-07 08:33:00
1      1995-01-07 15:09:00
2      1995-01-07 02:47:00
3      1995-01-07 21:25:00
4      1995-01-07 02:45:00
              ...         
328    1995-02-21 04:17:00
329    1995-02-21 15:44:00
330    1995-02-21 12:08:00
331    1995-02-21 12:54:00
332    1995-02-21 22:15:00
Name: epoch_date_close_approach, Length: 333, dtype: object

In [None]:
df_february =
Febrero de 1995