<a href="https://colab.research.google.com/github/Mehul6112/Data-Science_curve/blob/main/Bike%20Rental%20Dataset.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Introduction:**

Welcome to my exciting exploration of bike rental data from Yulu, a renowned bike rental company. In this journey to analyze and uncover valuable insights hidden within this dataset. We will utilize the powerful capabilities of Python libraries such as pandas, matplotlib, and seaborn to unravel the secrets behind bike rental patterns and trends.

## **Data Exploration:**

1. **Data Cleaning:**
    * Load the dataset into a pandas DataFrame.
    * Check for missing values and handle them appropriately.
    * Identify and remove any outliers or inconsistencies in the data.

2. **Exploratory Data Analysis:**
    * Investigate the distribution of bike rentals across different time periods (e.g., hourly, daily, weekly, monthly).
    * Analyze the relationship between weather conditions and bike rentals.
    * Explore the impact of user demographics (e.g., age, gender) on bike rental patterns.
    * Identify popular pick-up and drop-off locations.

In [1]:
# Importing neccessary modules
from matplotlib import pyplot as plt
import pandas as pd
import seaborn as sns

In [2]:
df = pd.read_csv("/content/sample_data/Yulu.csv")
df.sample(10) # random data from our dataset

Unnamed: 0,datetime,season,holiday,workingday,weather,temp,atemp,humidity,windspeed,casual,registered,count
3023,2011-07-14 21:00:00,3,0,1,1,27.06,31.06,50,8.9981,58,203,261
2192,2011-05-18 06:00:00,2,0,1,1,21.32,25.0,100,8.9981,2,99,101
9370,2012-09-13 19:00:00,3,0,1,1,26.24,30.305,65,8.9981,80,594,674
1796,2011-05-01 18:00:00,2,0,0,2,18.86,22.725,82,6.0032,33,126,159
161,2011-01-08 00:00:00,1,0,0,2,7.38,9.85,51,11.0014,1,24,25
4034,2011-09-19 03:00:00,3,0,1,2,20.5,24.24,77,8.9981,1,4,5
1398,2011-04-04 03:00:00,2,0,1,1,15.58,19.695,66,19.0012,1,0,1
8518,2012-07-16 07:00:00,3,0,1,1,29.52,34.09,70,6.0032,24,459,483
10630,2012-12-09 08:00:00,4,0,0,3,16.4,20.455,87,11.0014,1,68,69
3501,2011-08-15 19:00:00,3,0,1,1,28.7,32.575,65,0.0,54,343,397


<a name = "About"></a>
## About this data

| Column         | Description                                                          |
|----------------|----------------------------------------------------------------------|
| **datetime**   | Date and Time                                                        |
| **season**     | Season <br> (1: Spring, 2: Summer, 3: Fall, 4: Winter)               |
| **holiday**    | Holiday Indicator <br> (1: Yes, 0: No)                               |
| **workingday** | Working Day Indicator <br> (1: Yes, 0: No)                           |
| **weather**    | Weather Conditions: <br>1: Clear, Few Clouds, Partly Cloudy <br>2: Mist, Cloudy <br>3: Light Snow, Light Rain <br>4: Heavy Rain, Snow, Fog |
| **temp**       | Temperature (°C)                                                     |
| **atemp**      | "Feels Like" Temperature (°C)                                         |
| **humidity**   | Humidity (%)                                                         |
| **windspeed**  | Wind Speed (km/h)                                                    |
| **casual**     | Count of Casual Users                                                |
| **registered** | Count of Registered Users                                            |
| **count**      | Total Count of Rental Bikes (Casual + Registered Users)              |


## Critical Factors to Assess in Dataset Analysis:

Explore the dataset with a keen eye to uncover various anomalies and issues that may affect data quality and analysis. Here are some key aspects to consider:

* **Duplicate Rows:** 🔄 Check for duplicate rows in the dataset, which can distort analysis and lead to inaccurate insights. Eliminate duplicates to ensure data integrity and reliability.

* **Outliers:** 📊 Identify outliers lurking within the data, as they can skew statistical analyses and distort patterns. Carefully examine outliers to determine their impact and consider appropriate handling strategies.

* **Incorrect Data Types:** 📝 Ensure that each column's data type aligns with its intended purpose. Verify that numerical data is represented as integers or floats, dates are in datetime format, and categorical variables are appropriately encoded.

* **Inconsistent Data Entry:** 🔄 Look out for inconsistencies in data entry, such as variations in spelling, formatting, or capitalization. Standardize data entry conventions to maintain consistency and improve data quality.


In [3]:
# The .duplicated() returns true/false value for any repetitions, and .sum() adds all these 0 & 1s column wise. If the sum is 0 then that means no duplicate data.
df.datetime.duplicated().sum() #checking for dates as that would impact if duplicate rows exist.

0

In [4]:
# converting date-time column to a more usable format that is native to pandas
df['datetime'] = pd.to_datetime(df['datetime'])
df.dtypes

datetime      datetime64[ns]
season                 int64
holiday                int64
workingday             int64
weather                int64
temp                 float64
atemp                float64
humidity               int64
windspeed            float64
casual                 int64
registered             int64
count                  int64
dtype: object

Let's check for any missing values and process the data.

In [5]:
# Check for missing values
missing_values = df.isnull()
# Summarize missing values
missing_values_summary = missing_values.sum()
print(missing_values_summary)


datetime      0
season        0
holiday       0
workingday    0
weather       0
temp          0
atemp         0
humidity      0
windspeed     0
casual        0
registered    0
count         0
dtype: int64


Looks like our data has no missing values, which is great. <br>
However, you may have noticed that certain columns like Season, Weather, Holiday, and Working Day contain numerical data with specific meanings, as elaborated in the [**About this Data**](#About) section. It would greatly enhance our analysis if we could map these numerical values to their corresponding descriptions. Doing so will make our visualizations more intuitive.

In [6]:
#using the map function to improve our dataset
df.holiday = df.holiday.map({0: 'No', 1: 'Yes'})
df.season = df.season.map({1: "Spring", 2: "Summer", 3: "Fall", 4: "Winter"})
df.workingday = df.workingday.map({0: 'No', 1: 'Yes'})
df.weather = df.weather.map({1: "Clear, Few Clouds, Partly Cloudy", 2: "Mist, Cloudy", 3: "Light Snow, Light Rain", 4: "Heavy Rain, Snow, Fog"})
df.sample(10) # to see a sample of our improved dataset

Unnamed: 0,datetime,season,holiday,workingday,weather,temp,atemp,humidity,windspeed,casual,registered,count
9170,2012-09-05 11:00:00,Fall,No,Yes,"Mist, Cloudy",31.98,37.12,62,11.0014,61,156,217
6581,2012-03-11 12:00:00,Spring,No,No,"Clear, Few Clouds, Partly Cloudy",16.4,20.455,37,19.9995,146,264,410
10491,2012-12-03 13:00:00,Winter,No,Yes,"Clear, Few Clouds, Partly Cloudy",23.78,27.275,56,6.0032,81,240,321
647,2011-02-10 05:00:00,Spring,No,Yes,"Mist, Cloudy",4.92,6.06,50,15.0013,0,6,6
5337,2011-12-16 11:00:00,Winter,No,Yes,"Mist, Cloudy",13.94,15.15,42,19.9995,9,135,144
2257,2011-06-01 23:00:00,Summer,No,Yes,"Clear, Few Clouds, Partly Cloudy",30.34,33.335,51,11.0014,10,59,69
7841,2012-06-07 02:00:00,Summer,No,Yes,"Clear, Few Clouds, Partly Cloudy",21.32,25.0,77,6.0032,0,12,12
3418,2011-08-12 08:00:00,Fall,No,Yes,"Clear, Few Clouds, Partly Cloudy",27.88,31.82,39,15.0013,29,397,426
1267,2011-03-17 15:00:00,Spring,No,Yes,"Clear, Few Clouds, Partly Cloudy",21.32,25.0,42,16.9979,30,95,125
5628,2012-01-09 15:00:00,Spring,No,Yes,"Light Snow, Light Rain",9.02,11.365,75,11.0014,5,64,69


Looks much better right?

In [7]:
# Now lets get an overview of numerical data using the .describe() function.
df.describe()

Unnamed: 0,temp,atemp,humidity,windspeed,casual,registered,count
count,10886.0,10886.0,10886.0,10886.0,10886.0,10886.0,10886.0
mean,20.23086,23.655084,61.88646,12.799395,36.021955,155.552177,191.574132
std,7.79159,8.474601,19.245033,8.164537,49.960477,151.039033,181.144454
min,0.82,0.76,0.0,0.0,0.0,0.0,1.0
25%,13.94,16.665,47.0,7.0015,4.0,36.0,42.0
50%,20.5,24.24,62.0,12.998,17.0,118.0,145.0
75%,26.24,31.06,77.0,16.9979,49.0,222.0,284.0
max,41.0,45.455,100.0,56.9969,367.0,886.0,977.0


## Visualization:

1. **Data Visualization:**
    * Utilize matplotlib and seaborn to create compelling visualizations that showcase the findings from our data exploration.
    * Generate bar charts, histograms, scatter plots, and heatmaps to illustrate trends and patterns in the data.
    * Ensure that the visualizations are clear, informative, and visually appealing.

2. **Interactive Visualization:**
    * Employ interactive visualization libraries such as Bokeh or Plotly to create interactive dashboards and plots.
    * Allow users to explore the data and gain insights based on their own selections and filters.


## **Insights and Conclusion:**

1. **Meaningful Questions and Inquiries:**
    * What are the peak hours and days for bike rentals?
    * How does weather affect the demand for bike rentals?
    * Are there any specific user demographics that are more likely to rent bikes?
    * Which pick-up and drop-off locations are the most popular?
    * Can we predict future bike rental demand based on historical data?

2. **Conclusions and Recommendations:**
    * Summarize the key insights and findings from our analysis.
    * Provide recommendations for Yulu based on our findings.
    * Suggest potential areas for future exploration and improvement.

## **Engaging Presentation:**

1. **Clear and Concise Language:**
    * Use simple and easy-to-understand language to explain the analysis and findings.
    * Avoid technical jargon and complex statistical terms.

2. **Storytelling:**
    * Present the analysis as a compelling story that captivates the audience.
    * Use anecdotes and real-world examples to illustrate the insights.

3. **Interactive Elements:**
    * Incorporate interactive elements such as sliders, drop-down menus, and clickable maps to engage the audience.
    * Allow users to explore the data and discover insights on their own.

