<a href="https://colab.research.google.com/github/Hinakoushar-Tatakoti/Quantifying-the-influence-of-imputing-missing-values-Bike-Sharing-System/blob/main/BSS-Data_Extraction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# @title Setup
from google.colab import auth
from google.cloud import bigquery
from google.colab import data_table

project = 'possible-arbor-361121' # Project ID inserted based on the query results selected to explore
location = 'US' # Location inserted based on the query results selected to explore
client = bigquery.Client(project=project, location=location)
data_table.enable_dataframe_formatter()
auth.authenticate_user()

## Reference SQL syntax from the original job
Use the ```jobs.query```
[method](https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query) to
return the SQL syntax from the job. This can be copied from the output cell
below to edit the query now or in the future. Alternatively, you can use
[this link](https://console.cloud.google.com/bigquery?j=possible-arbor-361121:US:bquxjob_24d548eb_1843947b048)
back to BigQuery to edit the query within the BigQuery user interface.

In [2]:
# Running this code will display the query used to generate your previous job

job = client.get_job('bquxjob_24d548eb_1843947b048') # Job ID inserted based on the query results selected to explore
print(job.query)

SELECT * FROM `possible-arbor-361121.bike_share01.bike_share_info` 
where DATE(starttime) >= DATE(timestamp('2018-04-01')) AND DATE(starttime) <= DATE(timestamp('2018-06-01'))


# Result set loaded from BigQuery job as a DataFrame
Query results are referenced from the Job ID ran from BigQuery and the query
does not need to be re-run to explore results. The ```to_dataframe```
[method](https://googleapis.dev/python/bigquery/latest/generated/google.cloud.bigquery.job.QueryJob.html#google.cloud.bigquery.job.QueryJob.to_dataframe)
downloads the results to a Pandas DataFrame by using the BigQuery Storage API.

To edit query syntax, you can do so from the BigQuery SQL editor or in the
```Optional:``` sections below.

In [3]:
# Running this code will read results from your previous job

job = client.get_job('bquxjob_24d548eb_1843947b048') # Job ID inserted based on the query results selected to explore
results = job.to_dataframe()
results



Unnamed: 0,tripduration,starttime,stoptime,start_station_id,start_station_name,start_station_latitude,start_station_longitude,end_station_id,end_station_name,end_station_latitude,...,eightd_has_key_dispenser,eightd_has_available_keys,num_bikes_available,num_docks_available,num_bikes_disabled,num_docks_disabled,is_renting,is_returning,is_installed,last_reported
0,448,2018-04-13 02:55:44.723,2018-04-13 03:03:13.610,332,Cherry St,40.712199,-73.979481,361,Allen St & Hester St,40.716059,...,False,False,25,8,3,0,True,True,True,2021-11-09
1,410,2018-05-31 19:00:03.252,2018-05-31 19:06:53.535,332,Cherry St,40.712199,-73.979481,307,Canal St & Rutgers St,40.714275,...,False,False,25,8,3,0,True,True,True,2021-11-09
2,1906,2018-04-28 12:19:42.730,2018-04-28 12:51:29.566,332,Cherry St,40.712199,-73.979481,309,Murray St & West St,40.714979,...,False,False,25,8,3,0,True,True,True,2021-11-09
3,463,2018-05-26 10:36:09.603,2018-05-26 10:43:52.904,332,Cherry St,40.712199,-73.979481,349,Rivington St & Ridge St,40.718502,...,False,False,25,8,3,0,True,True,True,2021-11-09
4,2306,2018-05-02 09:53:52.951,2018-05-02 10:32:19.268,332,Cherry St,40.712199,-73.979481,502,Henry St & Grand St,40.714215,...,False,False,25,8,3,0,True,True,True,2021-11-09
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2624905,2100,2018-05-08 14:18:33.607,2018-05-08 14:53:34.158,3374,Central Park North & Adam Clayton Powell Blvd,40.799484,-73.955613,3458,W 55 St & 6 Ave,40.763094,...,False,False,32,1,3,0,True,True,True,2021-11-09
2624906,2925,2018-05-03 16:22:10.474,2018-05-03 17:10:55.862,3374,Central Park North & Adam Clayton Powell Blvd,40.799484,-73.955613,3137,5 Ave & E 73 St,40.772828,...,False,False,32,1,3,0,True,True,True,2021-11-09
2624907,1435,2018-04-01 15:05:15.849,2018-04-01 15:29:11.518,3374,Central Park North & Adam Clayton Powell Blvd,40.799484,-73.955613,3226,W 82 St & Central Park West,40.782750,...,False,False,32,1,3,0,True,True,True,2021-11-09
2624908,9216,2018-04-14 11:08:34.564,2018-04-14 13:42:11.170,3374,Central Park North & Adam Clayton Powell Blvd,40.799484,-73.955613,3374,Central Park North & Adam Clayton Powell Blvd,40.799484,...,False,False,32,1,3,0,True,True,True,2021-11-09


## Show descriptive statistics using describe()
Use the ```pandas DataFrame.describe()```
[method](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.describe.html)
to generate descriptive statistics. Descriptive statistics include those that
summarize the central tendency, dispersion and shape of a dataset’s
distribution, excluding ```NaN``` values. You may also use other Python methods
to interact with your data.

In [4]:
results.describe()

Unnamed: 0,tripduration,start_station_id,start_station_latitude,start_station_longitude,end_station_id,end_station_latitude,end_station_longitude,bikeid,birth_year,station_id,region_id,capacity,num_bikes_available,num_docks_available,num_bikes_disabled,num_docks_disabled
count,2624910.0,2624910.0,2624910.0,2624910.0,2624910.0,2624910.0,2624910.0,2624910.0,2624910.0,2624910.0,2624910.0,2624910.0,2624910.0,2624910.0,2624910.0,2624910.0
mean,1072.139,1544.33,40.73686,-73.98245,1583.527,40.73664,-73.98261,26228.08,1978.443,1544.33,71.0,42.55252,23.54965,16.40393,2.039725,0.7090262
std,22837.64,1420.908,0.03120878,0.0193609,1434.049,0.03082768,0.01926291,5815.758,11.85696,1420.908,0.0,18.7905,19.18391,16.72578,2.970601,4.674512
min,61.0,72.0,40.6554,-74.02535,72.0,40.6554,-74.0686,14529.0,1885.0,72.0,71.0,0.0,0.0,0.0,0.0,0.0
25%,365.0,372.0,40.71723,-73.99517,380.0,40.71723,-73.9953,20601.0,1969.0,372.0,71.0,30.0,8.0,2.0,1.0,0.0
50%,631.0,496.0,40.7365,-73.98621,505.0,40.73705,-73.98565,27929.0,1980.0,496.0,71.0,39.0,21.0,11.0,2.0,0.0
75%,1118.0,3177.0,40.75971,-73.97188,3236.0,40.75757,-73.97208,31091.0,1988.0,3177.0,71.0,54.0,34.0,27.0,3.0,0.0
max,10283680.0,3686.0,40.81439,-73.90774,3686.0,40.81439,-73.90774,33690.0,2002.0,3686.0,71.0,91.0,79.0,77.0,57.0,43.0
