# Análisis Exploratorio de Datos

See, fork, and run a random forest benchmark model through Kaggle Scripts

You are provided hourly rental data spanning two years. For this competition, the training set is comprised of the first 19 days of each month, while the test set is the 20th to the end of the month. You must predict the total count of bikes rented during each hour covered by the test set, using only information available prior to the rental period.

Data Fields
* `datetime` - hourly date + timestamp
* `season` -  1 = spring, 2 = summer, 3 = fall, 4 = winter
* `holiday` - whether the day is considered a holiday
* `workingday` - whether the day is neither a weekend nor holiday
* `weather` -
    * 1: Clear, Few clouds, Partly cloudy, Partly cloudy
    * 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
    * 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
    * 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
* `temp` - temperature in Celsius
* `atemp` - "feels like" temperature in Celsius
* `humidity` - relative humidity
* `windspeed` - wind speed
* `casual` - number of non-registered user rentals initiated
* `registered` - number of registered user rentals initiated
* `count` - number of total rentals

## Librerías

In [None]:
import seaborn as sns
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt

In [None]:
sns.set_style("darkgrid")

## Importar DataSet

In [None]:
# data = pd.read_csv("/kaggle/input/bike-sharing-demand/train.csv")
data = pd.read_csv("../data/train.csv")
data["datetime"] = pd.to_datetime(data["datetime"])

In [None]:
data.head()

In [None]:
data.columns

* `datetime` - hourly date + timestamp

In [None]:
df = data.copy()
df.set_index("datetime", inplace=True)
df.asfreq("1H")  # Establecer que la frecuencia es de una hora
df.head()

In [None]:
len(data), len(df)

* `season` -  1 = spring, 2 = summer, 3 = fall, 4 = winter

In [None]:
df.season.unique()

* `holiday` - whether the day is considered a holiday

In [None]:
df.holiday.unique()

* `workingday` - whether the day is neither a weekend nor holiday

In [None]:
df.workingday.unique()

* `weather` -
    * 1: Clear, Few clouds, Partly cloudy, Partly cloudy
    * 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
    * 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
    * 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog

In [None]:
df.weather.unique()

* `temp` - temperature in Celsius
* `atemp` - "feels like" temperature in Celsius

In [None]:
df.temp.describe()

In [None]:
df.atemp.describe()

Comparación de la temperatura real vs. la temperatura "de cómo se siente"

In [None]:
from datetime import timedelta
ti = df.index[0]
tf = ti + timedelta(days=365)

In [None]:
_ = plt.figure(figsize=(15, 7))
df.temp[ti:tf].plot()
df.atemp[ti:tf].plot()
plt.legend()
plt.show()

* `humidity` - relative humidity

In [None]:
_ = plt.figure(figsize=(15, 7))
df.humidity.plot()
plt.show()

* `windspeed` - wind speed

In [None]:
_ = plt.figure(figsize=(15, 7))
df.windspeed.plot()
plt.show()

* `casual` - number of non-registered user rentals initiated
* `registered` - number of registered user rentals initiated
* `count` - number of total rentals

In [None]:
_ = plt.figure(figsize=(15, 7))
df["count"].plot()
df.registered.plot()
df.casual.plot()
plt.legend()
plt.show()

In [None]:
(df.index[-1] - df.index[0])/7

In [None]:
sns.pairplot(data)
plt.show()

In [None]:
holiday = data[data.holiday == 1]
no_holiday = data[data.holiday == 0]
working = data[data.workingday == 1]
no_working = data[data.workingday == 0]

In [None]:
cols = ["casual", "registered"]
no_holiday[cols].hist();

In [None]:
holiday[cols].hist();

In [None]:
working[cols].hist(bins=200);

In [None]:
no_working[cols].hist(bins=200);

In [None]:
data[cols].hist(bins=200);

In [None]:
data.columns

In [None]:
%matplotlib inline

In [None]:
plt.figure(figsize=(20,5))
plt.plot(data["datetime"].values[:], data["casual"].values[:], ".", )
plt.plot(data["datetime"].values[:], data["windspeed"].values[:], ".")
