Overview
Bike sharing systems are a new generation of traditional bike rentals where the whole process from membership, rental and return back has become automatic. Through these systems, the user is able to easily rent a bike from a particular position and return back at another position. Currently, there are about over 500 bike-sharing programs around the world which are composed of over 500 thousands bicycles. Today, there exists great interest in these systems due to their important role in traffic, environmental and health issues.

Apart from interesting real-world applications of bike sharing systems, the characteristics of data being generated by these systems make them attractive for the research. Opposed to other transport services such as bus or subway, the duration of travel, departure, and arrival position is explicitly recorded in these systems. This feature turns the bike sharing system into a virtual sensor network that can be used for sensing mobility in the city. Hence, it is expected that most of the important events in the city could be detected via monitoring these data.


Attribute Information

* instant: record index
* dteday : date
* season : season (1:springer, 2:summer, 3:fall, 4:winter)
* yr : year (0: 2011, 1:2012)
* mnth : month ( 1 to 12)
* hr : hour (0 to 23)
* holiday : weather day is holiday or not
* weekday : day of the week
* workingday : if day is neither weekend nor holiday is 1, otherwise is 0.
* weathersit :
1: Clear, Few clouds, Partly cloudy, Partly cloudy
2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
* temp : Normalized temperature in Celsius. The values are derived via (t-t_min)/(t_max-t_min), t_min=-8, t_max=+39 (only in hourly scale)
* atemp: Normalized feeling temperature in Celsius. The values are derived via (t-t_min)/(t_max-t_min), t_min=-16, t_max=+50 (only in hourly scale)
* hum: Normalized humidity. The values are divided to 100 (max)
* windspeed: Normalized wind speed. The values are divided to 67 (max)
* casual: count of casual users
* registered: count of registered users
* cnt: count of total rental bikes including both casual and registered

# Reducción de dimensión

In [None]:
# Importar librerias 
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA

### Carga de Base de datos relacionada con la gestión de las bicicletas

In [None]:
# load the training dataset
!wget https://raw.githubusercontent.com/MicrosoftDocs/mslearn-introduction-to-machine-learning/main/Data/ml-basics/daily-bike-share.csv
bike_data = pd.read_csv('daily-bike-share.csv')
bike_data.head()

# Clasificación

In [None]:
# importar librerias relacionadas con la clasificación 
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix, classification_report
import seaborn as sns

### Carga de Base de datos para clasificación

In [None]:
# Crear una nueva columna 'high_rentals' como objetivo para clasificación
bike_data['high_rentals'] = bike_data['rentals'].apply(lambda x: 1 if x > 500 else 0)

# Seleccionar las características (puedes elegir otras características)
features = ['season', 'year', 'month', 'holiday', 'weekday', 'workingday', 'temp', 'humidity', 'windspeed']

# Definir X y y
X = bike_data[features]
y = bike_data['high_rentals']

In [None]:
from sklearn.model_selection import train_test_split

# Split data 70%-30% into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)

print ('Training Set: %d rows\nTest Set: %d rows' % (X_train.shape[0], X_test.shape[0]))

In [None]:
from sklearn.preprocessing import StandardScaler

# Escalar los datos
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)


In [None]:
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Crear y entrenar el modelo SVM
svm_model = SVC()
svm_model.fit(X_train_scaled, y_train)

# Predicciones
y_pred_svm = svm_model.predict(X_test_scaled)

# Evaluar el modelo
accuracy_svm = accuracy_score(y_test, y_pred_svm)
print(f'Precisión del modelo SVM: {accuracy_svm}')


In [None]:
# Generate confusion matrix and classification report
conf_matrix = confusion_matrix(y_test, y_pred_svm)
class_report = classification_report(y_test, y_pred_svm)

# Print classification report to the console
print("Classification Report:\n", class_report)

In [None]:
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

# Crear y entrenar el modelo LDA
lda_model = LinearDiscriminantAnalysis()
lda_model.fit(X_train_scaled, y_train)

# Predicciones
y_pred_lda = lda_model.predict(X_test_scaled)

# Evaluar el modelo
accuracy_lda = accuracy_score(y_test, y_pred_lda)
print(f'Precisión del modelo LDA: {accuracy_lda}')


In [None]:
# Generate confusion matrix and classification report
conf_matrix = confusion_matrix(y_test, y_pred_lda)
class_report = classification_report(y_test, y_pred_lda)

# Print classification report to the console
print("Classification Report:\n", class_report)

## Sección de modelo no supervisado

In [None]:
# Realizar la reducción de dimensión

pca = PCA(n_components=2, whiten = False, random_state = 2019)
X_pca = pca.fit_transform(bike_data)


In [None]:

# Realizamos gráfica de la reducción de dimensión
plt.figure(figsize=(12,12))

plt.scatter(X_pca[y==0, 0], X_pca[y==0, 1], color='red', alpha=0.5, label='0')
plt.scatter(X_pca[y==1, 0], X_pca[y==1, 1], color='blue', alpha=0.5, label='1')
plt.title("PCA")
plt.ylabel('Les coordonnees de Y')
plt.xlabel('Les coordonnees de X')
plt.legend()
plt.show()