<a href="https://colab.research.google.com/github/alexandragrecu/Total-riders-in-a-month/blob/main/Total_rides_in_a_month.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🚌 Total rides in a month

## 📘 1. Task
Your goal is to find the total number of rides provided to passengers passing through the Wilson station when riding Chicago's public transportation system on weekdays in July.

## 💾 2. Data sets
You will use 3 data sets:

- calendar data set, which contains *year, month, day, day_type* columns
- ridership data set, which contains *station_id, year, month, day, rides* columns
- stations data set, which contains *station_id, station_name, location* columns

You can find these data sets [here](https://github.com/alexandragrecu/Total-riders-in-a-month).

## 📚 3. Import libraries


In [1]:
import pandas as pd
from google.colab import files

## 🔣 4. Load data frames
We will load the available dataset.

In [2]:
uploaded = files.upload()

Saving cta_calendar.p to cta_calendar.p
Saving cta_ridership.p to cta_ridership.p
Saving stations.p to stations.p


In [5]:
calendar = pd.read_pickle('cta_calendar.p')
ridership = pd.read_pickle('cta_ridership.p')
stations = pd.read_pickle('stations.p')


**Let's have a look at our datasets:**

In [4]:
calendar.head()

Unnamed: 0,year,month,day,day_type
0,2019,1,1,Sunday/Holiday
1,2019,1,2,Weekday
2,2019,1,3,Weekday
3,2019,1,4,Weekday
4,2019,1,5,Saturday


In [6]:
ridership.head()

Unnamed: 0,station_id,year,month,day,rides
0,40010,2019,1,1,576
1,40010,2019,1,2,1457
2,40010,2019,1,3,1543
3,40010,2019,1,4,1621
4,40010,2019,1,5,719


In [7]:
stations.head()

Unnamed: 0,station_id,station_name,location
0,40010,Austin-Forest Park,"(41.870851, -87.776812)"
1,40020,Harlem-Lake,"(41.886848, -87.803176)"
2,40030,Pulaski-Lake,"(41.885412, -87.725404)"
3,40040,Quincy/Wells,"(41.878723, -87.63374)"
4,40050,Davis,"(42.04771, -87.683543)"


**Merge Data Frames - inner join:**

In [9]:
rides = ridership.merge(calendar, on=["year", "month", "day"]) \
          .merge(stations, on="station_id")

rides.head()

Unnamed: 0,station_id,year,month,day,rides,day_type,station_name,location
0,40010,2019,1,1,576,Sunday/Holiday,Austin-Forest Park,"(41.870851, -87.776812)"
1,40010,2019,1,2,1457,Weekday,Austin-Forest Park,"(41.870851, -87.776812)"
2,40010,2019,1,3,1543,Weekday,Austin-Forest Park,"(41.870851, -87.776812)"
3,40010,2019,1,4,1621,Weekday,Austin-Forest Park,"(41.870851, -87.776812)"
4,40010,2019,1,5,719,Saturday,Austin-Forest Park,"(41.870851, -87.776812)"


In [12]:
rides.shape

(3285, 8)

**Create filter criteria:**

In [15]:
filter_criteria = ((rides["month"] == 7) & (rides["day_type"] == "Weekday") & (rides["station_name"] == "Wilson"))  

print(filter_criteria)

0       False
1       False
2       False
3       False
4       False
        ...  
3280    False
3281    False
3282    False
3283    False
3284    False
Length: 3285, dtype: bool


**Find out the number of rides:**

In [17]:
number_of_rides = rides.loc[filter_criteria, "rides"].sum()

print("Total number of rides provided to passengers passing through the Wilson station when riding Chicago's public transportation system on weekdays in July is", number_of_rides)

Total number of rides provided to passengers passing through the Wilson station when riding Chicago's public transportation system on weekdays in July is 140005
