In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

* Importing all necessary libraries for the project
* numpy -> used to work with arrays
* pandas -> used to work with datasets 
* matplotlib & seaborn -> used for plotting graphs

In [2]:
data=pd.read_csv('../input/gym-data-472mbah/data.csv')
data.head(5)

* Importing and viewing the datase
* pd.read_csv -> to import and read the the dataset
* .head() -> to display the values of the dataset

In [3]:
data.describe()

* .describe() -> gives the statistical data of each column

In [4]:
data.isnull().values.any()

* Checking if any value in any column is null

In [5]:
data['date']=pd.to_datetime(data['date'])
data['year']=data['date'].apply(lambda data: data.year)
data['month']=data['date'].apply(lambda data: data.month)
data['day']=data['date'].apply(lambda data: data.day)
data['hour']=data['date'].apply(lambda data: data.hour)
data['minute']=data['date'].apply(lambda data: data.minute)

* Breaking the "date" column into year,month,day,hour and minutes from the given data in the column,
* We create 5 new columns 

In [6]:
data.head()

In [7]:
data.drop(data.columns[[1]],axis=1,inplace=True)
data.head()

* We delete the date column as it is useless now

In [8]:
x=data.iloc[:,1:].values
y=data.iloc[:,0].values

* Here we assign values to the dependent(y) and independent variable(x)
* We will use regression on these two arrays itself

In [9]:
print(x)

In [10]:
print(y)

# **We don't need to take care of any missing data as there is no null values**

# **And neither do we have to encode the data as regression has coefficient values in its equation.**

In [11]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = 0)

* Importing the train test model and assigning values to it.
* We keep the train size to be 80% of the dataset and 20% for testing the model.

In [12]:
from sklearn.ensemble import RandomForestRegressor
regressor = RandomForestRegressor(n_estimators = 10, random_state = 0)
regressor.fit(x_train, y_train)

* Importing the Random Forest Regressor from sklearn
* We train the model on our training sets (x_test and y_test)

In [13]:
y_pred = regressor.predict(x_test)

* We store all the predicted values into y_pred using the regressor model we just created

In [14]:
np.set_printoptions(precision=2)
print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1))

* We print the Predicted vs Actual values side by side.

In [15]:
from sklearn.metrics import r2_score
r2_score(y_test, y_pred)

* Here we see the r-score value of Random Forest is 93% based on its result.
* Note: Higher the R-score value the better the model is for the dataset.