# Bay Wheels's trip Dataset Exploration
## by Chrysanthi Polyzoni

> We will be exploring data teken from [Bay Wheels Bike share service](https://www.lyft.com/bikes/bay-wheels/system-data). Bay Wheels offers affordable, accessible, and fun transportation option for everyone. [Bay Area](https://it.wikipedia.org/wiki/San_Francisco_Bay_Area) residents who qualify for CalFresh, SFMTA Lifeline Pass, or PG&E CARE utility discount are eligible to join Bike Share for All program for 5 USD for the first year — now accepting prepaid cards!

The Data
Each trip is anonymized and includes:

* Trip Duration (seconds)
* Start Time and Date
* End Time and Date
* Start Station ID
* Start Station Name
* Start Station Latitude
* Start Station Longitude
* End Station ID
* End Station Name
* End Station Latitude
* End Station Longitude
* Bike ID
* User Type (Subscriber or Customer – “Subscriber” = Member or “Customer” = Casual

This data is provided according to the [Bay Wheels License Agreement](https://baywheels-assets.s3.amazonaws.com/data-license-agreement.html).

<a id='top'></a>
## Preliminary Wrangling

- [Gathering the Data](#aquire)
- [Assessing](#assess)
- [Cleaning](#clean)
- [Exploratory Data Analysis](#analyze)

<a id='aquire'></a>
## Gathering the Data for the greater San Fransisco Bay Area:
The Data for one month of a year is saved in a zipped 'csv' File on the [Website of the Company](https://s3.amazonaws.com/baywheels-data/index.html) . As an example, the File '201801-fordgobike-tripdata.csv.zip' contains all the rides that occured in January 2018.
After programmatically downloading and unzipping all available months of the year 2018,
We will load and concatenate all <i>CSV</i> Files in a Dataframe where we will be conducting our analysis on.

In [1]:
# import all packages and set plots to be embedded inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb

%matplotlib inline

> Load in your dataset and describe its properties through the questions below.
Try and motivate your exploration goals through this section.

Downloading all Files programmatically:¶

In [73]:
iter_wheels in requests.get(base_url).text

True

In [97]:
import requests
import os

csv_downloaded = False # Set to True to prevent downloading Process from occuring \
                      #every time we re-run the notebook
filename_ford = '2019xx-fordgobike-tripdata.csv.zip'
filename_wheels = '2019xx-baywheels-tripdata.csv.zip'
base_url = 'https://s3.amazonaws.com/baywheels-data/'

datafolder = './data' # keeping all csv files in a 'data' directory, under the current directory
if not os.path.exists(datafolder):
    os.makedirs(datafolder)
    print('make {}'.format(datafolder))

for i in range(1, 13):
    if csv_downloaded:
        break
    iter_ford = str(filename_ford).replace("xx", "{:02d}".format(i))
    iter_wheels = str(filename_wheels).replace("xx", "{:02d}".format(i))
    url_ford = base_url + iter_ford
    url_wheels = base_url + iter_wheels
    #print(url_ford)
    #print(url_wheels)
    local_ford = os.path.join(datafolder, iter_ford)
    local_wheels = os.path.join(datafolder, iter_wheels)
    
    # download the zip file locally
    if not (os.path.exists(local_wheels) or os.path.exists(local_wheels)):
        neg_response = []
        for _ in range(1,13):
            
            try:
                response = requests.get(url_wheels)           
            except:
                response = requests.get(url_ford)
            finally:
                neg_responce = neg_response.append(url_wheels[-len(iter_wheels):])

                        
print(neg_response)                   
                        
                        
                        
#if iter_wheels in requests.get(base_url).text:
#os.listdir(datafolder)

['201904-baywheels-tripdata.csv.zip', '201904-baywheels-tripdata.csv.zip', '201904-baywheels-tripdata.csv.zip', '201904-baywheels-tripdata.csv.zip', '201904-baywheels-tripdata.csv.zip', '201904-baywheels-tripdata.csv.zip', '201904-baywheels-tripdata.csv.zip', '201904-baywheels-tripdata.csv.zip', '201904-baywheels-tripdata.csv.zip', '201904-baywheels-tripdata.csv.zip', '201904-baywheels-tripdata.csv.zip', '201904-baywheels-tripdata.csv.zip']


In [90]:
url_wheels[-len(iter_wheels):]

'201912-baywheels-tripdata.csv.zip'

In [71]:
r = requests.get(base_url)
print(r.text)
iter_wheels in r.text

<?xml version="1.0" encoding="UTF-8"?>
<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Name>baywheels-data</Name><Prefix></Prefix><Marker></Marker><MaxKeys>1000</MaxKeys><IsTruncated>false</IsTruncated><Contents><Key>2017-fordgobike-tripdata.csv.zip</Key><LastModified>2020-01-10T15:29:49.000Z</LastModified><ETag>&quot;3e6f40e77dc54fbcca79a534cdec3c3c&quot;</ETag><Size>15149769</Size><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>201801-fordgobike-tripdata.csv.zip</Key><LastModified>2020-01-10T15:29:39.000Z</LastModified><ETag>&quot;59cea251476353ef848fb5e18c3c8cd0&quot;</ETag><Size>2890294</Size><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>201802-fordgobike-tripdata.csv.zip</Key><LastModified>2020-01-10T15:29:39.000Z</LastModified><ETag>&quot;c91c375884bb6665bc9132ea0392dca5&quot;</ETag><Size>3281072</Size><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>201803-fordgobike-tripdata.csv.zip</Key><LastModified>2020-01-10T1

True

In [77]:
import requests
import os

csv_downloaded = False # Set to True to prevent downloading Process from occuring \
                      #every time we re-run the notebook
filename_ford = '2019xx-fordgobike-tripdata.csv.zip'
filename_wheels = '2019xx-baywheels-tripdata.csv.zip'
base_url = 'https://s3.amazonaws.com/baywheels-data/'

datafolder = './data' # keeping all csv files in a 'data' directory, under the current directory
if not os.path.exists(datafolder):
    os.makedirs(datafolder)
    print('make {}'.format(datafolder))

for i in range(1, 13):
    if csv_downloaded:
        break
    iter_ford = str(filename_ford).replace("xx", "{:02d}".format(i))
    iter_wheels = str(filename_wheels).replace("xx", "{:02d}".format(i))
    url_ford = base_url + iter_ford
    url_wheels = base_url + iter_wheels
    print(url_ford)
    print(url_wheels)
    local_ford = os.path.join(datafolder, iter_ford)
    local_wheels = os.path.join(datafolder, iter_wheels)
    
    # download the zip file locally
    if not (os.path.exists(local_wheels) or os.path.exists(local_wheels)):
        if iter_wheels in requests.get(base_url).text:
            try:
                response = requests.get(url_wheels)
                print(response)
                print(response.status_code)
                with open(local_wheels, mode='wb') as file:
                    file.write(response.content)
                    # unzip file to the previously created 'data' directory
                    with zipfile.ZipFile(local_wheels, 'r') as myzip:
                        myzip.extractall(path = datafolder)
                        print("{} --> unzipping file".format(local_wheels))
                        print("{}--> removing file".format(local_wheels))
            except:
                response = requests.get(url_ford)
                print(response)
                print(response.status_code)
                
                with open(local_ford, mode='wb') as file:
                    file.write(response.content)
                    # unzip file to the previously created 'data' directory
                    with zipfile.ZipFile(local_ford, 'r') as myzip:
                        myzip.extractall(path = datafolder)
                        print("{} --> unzipping file".format(local_ford))
                        print("{}--> removing file".format(local_ford))
os.listdir(datafolder)

make ./data
https://s3.amazonaws.com/baywheels-data/201901-fordgobike-tripdata.csv.zip
https://s3.amazonaws.com/baywheels-data/201901-baywheels-tripdata.csv.zip
https://s3.amazonaws.com/baywheels-data/201902-fordgobike-tripdata.csv.zip
https://s3.amazonaws.com/baywheels-data/201902-baywheels-tripdata.csv.zip
https://s3.amazonaws.com/baywheels-data/201903-fordgobike-tripdata.csv.zip
https://s3.amazonaws.com/baywheels-data/201903-baywheels-tripdata.csv.zip
https://s3.amazonaws.com/baywheels-data/201904-fordgobike-tripdata.csv.zip
https://s3.amazonaws.com/baywheels-data/201904-baywheels-tripdata.csv.zip
https://s3.amazonaws.com/baywheels-data/201905-fordgobike-tripdata.csv.zip
https://s3.amazonaws.com/baywheels-data/201905-baywheels-tripdata.csv.zip
<Response [200]>
200
./data\201905-baywheels-tripdata.csv.zip --> unzipping file
./data\201905-baywheels-tripdata.csv.zip--> removing file
https://s3.amazonaws.com/baywheels-data/201906-fordgobike-tripdata.csv.zip
https://s3.amazonaws.com/bayw

['201905-baywheels-tripdata.csv',
 '201905-baywheels-tripdata.csv.zip',
 '201906-baywheels-tripdata.csv',
 '201906-baywheels-tripdata.csv.zip',
 '201907-baywheels-tripdata.csv',
 '201907-baywheels-tripdata.csv.zip',
 '201908-baywheels-tripdata.csv',
 '201908-baywheels-tripdata.csv.zip',
 '201909-baywheels-tripdata.csv',
 '201909-baywheels-tripdata.csv.zip',
 '201910-baywheels-tripdata.csv',
 '201910-baywheels-tripdata.csv.zip',
 '201911-baywheels-tripdata.csv',
 '201911-baywheels-tripdata.csv.zip',
 '201912-baywheels-tripdata.csv',
 '201912-baywheels-tripdata.csv.zip',
 '__MACOSX']

In [25]:
filename = '2019xx-fordgobike-tripdata.csv.zip'
print(len(filename))
201905-baywheels-tripdata.csv
201912-baywheels-tripdata.csv.zip

34


In [40]:
import requests
import os
import zipfile

csv_downloaded = False # Set to True to prevent downloading Process from occuring \
                      #every time we re-run the notebook
filename_ford = '2019xx-fordgobike-tripdata.csv.zip'
filename_wheels = '2019xx-baywheels-tripdata.csv.zip'
base_url = 'https://s3.amazonaws.com/baywheels-data/'

datafolder = './data' # keeping all csv files in a 'data' directory, under the current directory
if not os.path.exists(datafolder):
    os.makedirs(datafolder)
    print('make {}'.format(datafolder))

for i in range(1, 13):
    if csv_downloaded:
        break
    iter_ford = str(filename_ford).replace("xx", "{:02d}".format(i))
    iter_wheels = str(filename_wheels).replace("xx", "{:02d}".format(i))
    url_ford = base_url + iter_ford
    url_wheels = base_url + iter_wheels
    print(url_ford, url_wheels)
    local_ford = os.path.join(datafolder, url_ford)
    local_wheels = os.path.join(datafolder, url_wheels)
    if not os.path.exists(local_ford) or os.path.exists(local_wheels):
        print("{} or {} does not exist. --> download it".format(local_ford, local_wheels))
        # download the zip file locally
        try:
            response = requests.get(url_ford)
            print(response.status_code)
        except:
            response = requests.get(url_wheels)
        print("Response status code for {} is {}".format(iteratefilename, response.status_code))
        if (response.status_code == 200):
            with open(localfile, mode='wb') as file:
                file.write(response.content)
                # unzip file to the previously created 'data' directory
                with zipfile.ZipFile(localfile, 'r') as myzip:
                    myzip.extractall(path=datafolder)
                    print("{} --> unzipping file".format(localfile))
            #remove the zip file
            os.remove(localfile)
            print("{}--> removing file".format(localfile))
            
os.listdir(datafolder)

https://s3.amazonaws.com/baywheels-data/201901-fordgobike-tripdata.csv.zip https://s3.amazonaws.com/baywheels-data/201901-baywheels-tripdata.csv.zip
./data\https://s3.amazonaws.com/baywheels-data/201901-fordgobike-tripdata.csv.zip or ./data\https://s3.amazonaws.com/baywheels-data/201901-baywheels-tripdata.csv.zip does not exist. --> download it
200
Response status code for 201912-fordgobike-tripdata.csv.zip is 200
./data\201912-fordgobike-tripdata.csv.zip --> unzipping file
./data\201912-fordgobike-tripdata.csv.zip--> removing file
https://s3.amazonaws.com/baywheels-data/201902-fordgobike-tripdata.csv.zip https://s3.amazonaws.com/baywheels-data/201902-baywheels-tripdata.csv.zip
./data\https://s3.amazonaws.com/baywheels-data/201902-fordgobike-tripdata.csv.zip or ./data\https://s3.amazonaws.com/baywheels-data/201902-baywheels-tripdata.csv.zip does not exist. --> download it
200
Response status code for 201912-fordgobike-tripdata.csv.zip is 200
./data\201912-fordgobike-tripdata.csv.zip --

['201901-baywheels-tripdata.csv.zip',
 '201901-fordgobike-tripdata.csv',
 '201902-baywheels-tripdata.csv.zip',
 '201902-fordgobike-tripdata.csv',
 '201903-baywheels-tripdata.csv.zip',
 '201903-fordgobike-tripdata.csv',
 '201904-baywheels-tripdata.csv.zip',
 '201904-fordgobike-tripdata.csv',
 '201905-baywheels-tripdata.csv',
 '201906-baywheels-tripdata.csv',
 '201907-baywheels-tripdata.csv',
 '201908-baywheels-tripdata.csv',
 '201909-baywheels-tripdata.csv',
 '201910-baywheels-tripdata.csv',
 '201911-baywheels-tripdata.csv',
 '201912-baywheels-tripdata.csv',
 '__MACOSX']

In [None]:
./data\201806-fordgobike-tripdata.csv.zip
./data\201905-fordgobike-tripdata.csv.zip
./data\201910-fordgobike-tripdata.csv.zip
./data\2019xx-fordgobike-tripdata.csv.zip

### What is the structure of your dataset?

> Your answer here!

### What is/are the main feature(s) of interest in your dataset?

> Your answer here!

### What features in the dataset do you think will help support your investigation into your feature(s) of interest?

> Your answer here!

## Univariate Exploration

> In this section, investigate distributions of individual variables. If
you see unusual points or outliers, take a deeper look to clean things up
and prepare yourself to look at relationships between variables.

> Make sure that, after every plot or related series of plots, that you
include a Markdown cell with comments about what you observed, and what
you plan on investigating next.

### Discuss the distribution(s) of your variable(s) of interest. Were there any unusual points? Did you need to perform any transformations?

> Your answer here!

### Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?

> Your answer here!

## Bivariate Exploration

> In this section, investigate relationships between pairs of variables in your
data. Make sure the variables that you cover here have been introduced in some
fashion in the previous section (univariate exploration).

### Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset?

> Your answer here!

### Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?

> Your answer here!

## Multivariate Exploration

> Create plots of three or more variables to investigate your data even
further. Make sure that your investigations are justified, and follow from
your work in the previous sections.

### Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?

> Your answer here!

### Were there any interesting or surprising interactions between features?

> Your answer here!

> At the end of your report, make sure that you export the notebook as an
html file from the `File > Download as... > HTML` menu. Make sure you keep
track of where the exported file goes, so you can put it in the same folder
as this notebook for project submission. Also, make sure you remove all of
the quote-formatted guide notes like this one before you finish your report!