# Data Analysis with Pandas

## Dataset: Cycle Share

### Bicycle Trip Data from Seattle's Cycle Share System

#### Alex Angelico
#### 2021-19-01

In [119]:
import pandas as pd
import numpy as np
import matplotlib as plt

In [4]:
trips = pd.read_csv('trip.csv')
weather = pd.read_csv('weather.csv')
station = pd.read_csv('station.csv')

In [5]:
trips.info()
weather.info()
station.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 286858 entries, 0 to 286857
Data columns (total 12 columns):
 #   Column             Non-Null Count   Dtype  
---  ------             --------------   -----  
 0   trip_id            286858 non-null  int64  
 1   starttime          286858 non-null  object 
 2   stoptime           286858 non-null  object 
 3   bikeid             286858 non-null  object 
 4   tripduration       286858 non-null  float64
 5   from_station_name  286858 non-null  object 
 6   to_station_name    286858 non-null  object 
 7   from_station_id    286858 non-null  object 
 8   to_station_id      286858 non-null  object 
 9   usertype           286857 non-null  object 
 10  gender             181557 non-null  object 
 11  birthyear          181553 non-null  float64
dtypes: float64(2), int64(1), object(9)
memory usage: 26.3+ MB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 689 entries, 0 to 688
Data columns (total 21 columns):
 #   Column                      No

## Question 1
### What is the average trip duration for a borrowed bicycle?

In [9]:
average_trip_duration = str(round(trips['tripduration'].mean()/60, 2)).split('.')
average_trip_duration[1] = str(round((60*int(average_trip_duration[1])/100)))
f"The average trip duration for a borrowed bicycle is {average_trip_duration[0]} minutes {average_trip_duration[1]} seconds."

'The average trip duration for a borrowed bicycle is 19 minutes 38 seconds.'

## Question 2
### What’s the most common age of a bicycle-sharer?

In [11]:
biker_commonest_age = 2021 - int(trips['birthyear'].mode())
f"The most common age of a bicycle-sharer is {2021 - int(trips['birthyear'].mode())}."

'The most common age of a bicycle-sharer is 34.'

## Question 3
### Given all the weather data here, find the average precipitation per month, and the median precipitation.

In [122]:
raw_precipitation = weather[['Date', 'Precipitation_In']]

month_list = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
precipitation_data = []
for i in range(12):
    month_data = [row.Precipitation_In for row in raw_precipitation.iloc if int(row.Date.split('/')[0]) == i+1]
    month_data = np.array(month_data)
    precipitation_data.append(f'{month_list[i]} - mean precipitation: {round(np.mean(month_data), 2)} inches / median precipitation: {np.median(month_data)}')

precipitation_data

['Jan - mean precipitation: 0.14 inches / median precipitation: 0.02',
 'Feb - mean precipitation: 0.17 inches / median precipitation: 0.04',
 'Mar - mean precipitation: 0.16 inches / median precipitation: 0.025',
 'Apr - mean precipitation: 0.05 inches / median precipitation: 0.0',
 'May - mean precipitation: 0.01 inches / median precipitation: 0.0',
 'Jun - mean precipitation: 0.03 inches / median precipitation: 0.0',
 'Jul - mean precipitation: 0.01 inches / median precipitation: 0.0',
 'Aug - mean precipitation: 0.02 inches / median precipitation: 0.0',
 'Sep - mean precipitation: 0.04 inches / median precipitation: 0.0',
 'Oct - mean precipitation: 0.19 inches / median precipitation: 0.04',
 'Nov - mean precipitation: 0.19 inches / median precipitation: 0.035',
 'Dec - mean precipitation: 0.24 inches / median precipitation: 0.1']

## Question 4
### What’s the average number of bikes at a given bike station?

In [10]:
average_station_bikes = round(station['current_dockcount'].mean())
f"The average number of bikes at a given bike station is {average_station_bikes}."

'The average number of bikes at a given bike station is 17.'

## Question 5
### When a bike station is modified, is it more likely that it’ll lose bikes or gain bikes? How do you know?

In [20]:
installed = round(station['install_dockcount'].mean())
modified = round(station['current_dockcount'].mean())

if installed < modified:
    print(f'When a bike station is modified, it is more likely that it will gain bikes than lose bikes. We know this because the the average count of bikes at all stations at the time of station installaiton is {installed}, and the average count at all stations is {modified} after modification.')
else:
    print(f'When a bike station is modified, it is more likely that it will lose bikes than gain bikes. We know this because the the average count of bikes at all stations at the time of station installaiton is {installed}, and the average count at all stations after modification is {modified}.')

When a bike station is modified, it is more likely that it will lose bikes than gain bikes. We know this because the the average count of bikes at all stations at the time of station installaiton is 18, and the average count at all stations after modification is 17.


## Question 6a
### Do members or short-term pass holders take longer average trips on borrowed bikes?

In [123]:
usertype_mean_duration = trips.groupby('usertype')['tripduration'].mean().sort_values(ascending=False)
usertype_mean_duration

usertype
Short-Term Pass Holder    2183.761816
Member                     595.142260
Name: tripduration, dtype: float64