# Welcome to Projects in Programming & Data Science! 

---

# Laptop Setup

### NYU Classes

Make sure you are able to access our course page on NYU Classes. This is where I will be posting all of our course content (notebooks, datasets, et. cetera). This is also where all of your assignments will be posted, graded, and returned. 

### Colab

In order to standardize the way we all use Jupyter Notebook (described below) we are going to use Google's Colab (https://colab.research.google.com/notebooks/welcome.ipynb) 

Think of using Colab as renting a computer via your web browser (I recommend Chrome) that you can use. This is important because that means that in order to save your work, you need to download it from Colab onto your computer (machine) and re-upload it next time you'd like to work with it. 

For instance, if I want to open today's class notebook in Colab, I: 

<br>

1. **Will go to https://colab.research.google.com/notebooks/welcome.ipynb**

<div> 
    <img src="attachment:Screen%20Shot%202019-10-23%20at%201.51.42%20PM.png" width=600  />
</div>

2. **Will click "file" > "upload notebook"** 

<div> 
    <img src="attachment:Screen%20Shot%202019-10-23%20at%201.54.00%20PM.png" width=600  />
</div>

3. **Note that a Jupyter Notebook will always have a .ipynb extension. In Colab I can click "Upload" then "Choose File" and upload the notebook (that I have downloaded from NYU Classes) to Colab.**

<div> 
    <img src="attachment:Screen%20Shot%202019-10-23%20at%201.54.49%20PM.png" width=600  />
</div>

4. **Once I click "Open" ...**

<div> 
    <img src="attachment:Screen%20Shot%202019-10-23%20at%201.55.06%20PM.png" width=600  />
</div>

5. **Colab should open the notebook, and voila!**

<div> 
    <img src="attachment:Screen%20Shot%202019-10-23%20at%201.55.27%20PM.png" width=600  />
</div>

---

Today we're going to jump right in to the mix and leverage the CitiBike API to populate a sqlite database at regular intervals. Consider this your warm-up for the semester! 

[SQLite](https://www.sqlite.org/index.html) is a library that allows us to create, populate, and call upon a SQL Database. It's also serverless, meaning we don't need to access a separate server where we're storing our data – instead, we can directly access our databse. We can even store that database as a file on our local machine and call upon it. 

Now, let's check out the API we'll be working with: http://gbfs.citibikenyc.com/gbfs/gbfs.json

---

First, we'll request the json from the CitiBike API URL and just print it out to get a quick glimpse

As you can see, the json is a dictionary of lists and other dictionaries containing information about CitiBike stations across New York City. 

So, we have our data from the CitiBike feed, and it looks pretty good! Now we need to create a table within our database (the one we named citibikeData.db). We do that using the 'CREATE TABLE IF NOT EXISTS' statement seen below. 

In that statement, the 'IF NOT EXISTS' makes clear that we are going to create the table called 'StationsData' only once. That way, if we run that cell again, it's not going to overwrite the work we've previously done. 

Note that at this point we aren't adding any data to our table. All we're doing is telling SQLite that we want to create a new table, and providing it with a) the column names and b) the data type those columns should be expecting.

In [None]:
```python

sql = "CREATE TABLE IF NOT EXISTS StationsData (station_id int, num_ebikes_available int, num_bikes_available int, is_installed int, last_reported int, num_docks_disabled int, is_renting int, eightd_has_available_keys varchar(250), num_docks_available int, num_bikes_disabled int, legacy_id int, station_status varchar(250), is_returning int);" 

con.execute(sql)
con.commit()

```

Now that we have our database and our table, we want to insert our data. 

Below, we create a "query template" where we "INSERT OR IGNORE INTO" our table (StationsData) the values associated with each of our columns. 

We define those values by parsing through the CitiBike json we got earlier, and for each "row" of that json, we create a new row in our SQLite table. 

In [None]:
```python

query_template = """INSERT OR IGNORE INTO StationsData(station_id, num_ebikes_available, num_bikes_available, \
is_installed, last_reported, num_docks_disabled, is_renting, eightd_has_available_keys, \
num_docks_available, num_bikes_disabled, legacy_id, station_status, is_returning) \
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?);"""

for entry in clean_data: # for every station entry in the json 
    station_id = int(entry['station_id']) # find and set station_id
    num_ebikes_available = int(entry['num_ebikes_available'])
    num_bikes_available = int(entry['num_bikes_available'])
    is_installed = int(entry['is_installed'])
    last_reported = int(entry['last_reported'])
    num_docks_disabled = int(entry['num_docks_disabled'])
    is_renting = int(entry['is_renting'])
    eightd_has_available_keys = str(entry['eightd_has_available_keys'])
    num_docks_available = int(entry['num_docks_available'])
    num_bikes_disabled = int(entry['num_bikes_disabled'])
    legacy_id = int(entry['legacy_id'])
    station_status = str(entry['station_status'])
    is_returning = int(entry['is_returning'])
                           
    print("Inserting Station:", station_id, num_ebikes_available, num_bikes_available, is_installed, last_reported, num_docks_disabled, is_renting, eightd_has_available_keys, num_docks_available, num_bikes_disabled, legacy_id, station_status, is_returning) 
    
    query_parameters = (station_id, num_ebikes_available, num_bikes_available, is_installed, last_reported, num_docks_disabled, is_renting, eightd_has_available_keys, num_docks_available, num_bikes_disabled, legacy_id, station_status, is_returning) 
    
    con.execute(query_template, query_parameters)
    
con.commit()

```

Now, we can use [pd.read_sql](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql.html) to check that we are properly connected to our database, and the StationsData table within that database:

Looks good! Last but not least, let's set things up so that our database automatically updates every 15 seconds. 

---

I hope this has helped you find your programming legs! Next week we'll get back to descriptive analytics using Python and Pandas. For now, take time to refresh yourself on the content covered in "Introduction to Programming". 

If you need a referesher on your SQL skills, check out the "Supplementary Info" directory in the class repo.