# Introduction
<font color='orange'>[Google Colab]</font> In Part I, we explored the traffic image API and mapped out the locations of the cameras in the country.

We then acquired images from a few traffic cameras for inspection.

In this Part, this is what you'll do:
1. Import libraries
2. Collect two years' worth of traffic image JSON
3. Combine the eventual DataFrame
4. Filter for camera ID 1709 only
5. Export DataFrame for Part III

<strong>Apart from coding, data collection alone will take <font color='red'>2 hours or so</font>. Make sure you allocate plenty of time for this Part.</strong>

### Step 1: Import libraries
We'll start off with importing the libraries that we need:
- requests
- pandas as pd
- datetime
- time

In [None]:
# Step 1: Import libraries

# Testing collection
Before we collect data en masse, we should collect the data bit by bit first and measure the time taken to assess the right strategy.

We'll start by making simple class first, then building up for more while keeping an expectation of how much time these tasks will take.

### Step 2: Create a range of dates and time
We'll be retrieving a day's worth of data first, with an interval of 1 minute. 

We do that because the API documentation recommends so. Use pandas' date range to get a range of time between "2019/01/01" and "2019/01/02", with a frequency of 1 minute.

![DataRangeExample](https://uplevelsg.s3-ap-southeast-1.amazonaws.com/ProjectComputerVisionTraffic/DataRangeOneDay.png)

Your list will be 1,441 items long.

In [None]:
# Step 2: Get a range of dates and time

In [None]:
# Optional: Check length of list

### Step 3: Format the datetime into a string
Now that you have a list of datetime items, you'll have to structure them into the right format that you can use. 

You'll have to call the strftime method, along with the right format to format the eventual string output.

In this Step, try to print a string from the first datetime in the list.

```
2019-01-01 00:00:00 

to 

2019-01-01T00%3A00%3A00
```

Something like this.

In [None]:
# Step 3: Format the first datetime in list to string

### Step 4: Call API for the list of datetimes
Now that you've figured out how to format the datetime into a string, it's time to loop through them and make repeated calls and store the JSON responses in a list.

You should use the <strong>time</strong> library to measure how long it takes to call the API to get the all 1,441 times.

You should end up with a list containing 1,441 JSON objects.

<font color = 'red'>Allocate 30-40 minutes for this task.</font>

<details>
  <summary>Click here once if you're unsure and need pseudocode</summary>
  <ol>
    <li>Declare a variable containing an empty list</li>
    <li>Declare a variable containing a base URL for the API</li>
    <li>Use a for loop for the list you got from Step 3. In each loop:</li>
    <ol>
      <li>Declare a temporary URL consisting of the base URL and the current datetime in list</li>
      <li>Make a GET request using the URL and store the response in a variable</li>
      <li>Perform a .json method and store the JSON object in a new variable</li>
      <li>Append the variable into the list the you declared earlier</li>
    </ol>
  </ol>
</details>

In [None]:
# Step 4: Call API and get a list of JSON objects

### Step 5: Combine all the JSON into a DataFrame
With your list of JSON objects, turn each of them into DataFrame and append them in another list. 

![ConcatenatedDataFrameOneDay](https://uplevelsg.s3-ap-southeast-1.amazonaws.com/ProjectComputerVisionTraffic/ConcatenatedDataFrameOneDay.png)

With the list of DataFrames, concatenate them so that they'll end up as a single DataFrame.

You are anticipating:
- 124,874 rows
- 8 columns

In [None]:
# Step 5: Combine the list of JSON into a DataFrame

### Step 6: Filter the DataFrame for camera 1709 only
Your huge DataFrame currently contains the URLs for the images from ALL cameras in a single day.

Filter the DataFrame to contain only rows from <strong>camera_id = 1709</strong> so we can see how many rows there are in a single day. 

We should expect:
- 1441 rows
- 8 columns

In [None]:
# Step 6: Filter for 1709

### Step 7: Drop duplicates based on 'image'
If you observe carefully, there are some rows that are repeated.

Drop duplicates by 'image' column and see how many rows you end up with.

In [None]:
# Step 7: Drop duplicates by 'image' column

# Full data collection
Now that we have figured out how to collect one day's worth of data, we can now collect two one-month data:
- 2019/01/01 to 2019/02/01
- 2020/01/01 to 2020/01/01

Wait a minute. Do you notice something is off?

<details>
  <summary>Click here to check if your guess is right</summary>
  <p><strong>If doing one day's worth of API calling took 30 minutes, doing 2 months' worth of API calling means it will take 30 mins per day's JSON x 30 days per month x 2 months = 1800 minutes.</strong></p>
  <p><strong>That's 30 hours!</strong></p>
</details>

Don't worry - we'll be using multithreading to make multiple calls at the same time.

Reading: https://docs.python.org/3/library/concurrent.futures.html (scroll to ThreadPoolExecutor Example)

### Step 8: Set up date_ranges
Let's just set up the range of dates we need to collect data on.

Declare two variables containing the two date ranges we want, in minute frequency.



In [None]:
# Step 8a: Set up date range for 2019

In [None]:
# Step 8b: Set up date range for 2020

### Step 9: Define retrieveCameraJSON
Define a function nalled retrieveCameraJSON where it takes in one argument - url.

This function will return one JSON from one URL. 

In short, it's doing what you did in Step 4 but no for loop.

<details>
<summary>Click here for pseudocode if you need help</summary>
  <ol>
    <li><strong>Define</strong> retrieveCameraJSON, taking in one argument called date_time</li>
    <li>In the function definition</li>
    <ul>
    <li>Declare a variable that contains the base URL, just before date_time=</li>
    <li>Declare a variable that takes the base URL and combines it with the formatted date_time argument (refer to Step 3)</li>
    <li>Make a GET request to get the response of the API call</li>
    <li>Declare a varible to store the JSON data of the response</li>
    <li><strong>Return</strong> the variable</li>
    </ul>
  </ol>
</details>

In [None]:
# Step 9: Define retrieveCameraJSON

### Step 10: Import library
Since we're doing concurrency, we'll need:
- futures from concurrent

In [None]:
# Step 10: Import futures

### Step 11: Use concurrency to retrieve JSON data from 2019
Now that we have defind retrieveCameraJSON, let's make our API calls.

With reference to the <a href="https://docs.python.org/3/library/concurrent.futures.html">reading</a> provided above, you'll make lots of requests concurrently.

If this is the first time doing a concurrency call, don't worry - try:
1. Small number of rows to see if you got it right
2. Adapting the code from the reading

<font color='red'><strong>Reserve around 60 minutes for this Step.</strong></font>

<details>
<summary>Click here once for the pseudocode if you're stuck</summary>
  <ol>
    <li>Declare an empty list to store the <strong>.result</strong> of your completed futures</li>
    <li>Use a <strong>with</strong> statement with futures.ThreadPoolExecutor, with a max_workers of 150 as an <strong>executor</strong></li>
    <li>Declare a variable, where it is a list containing the the futures of retrieveCameraJSON with the date <strong>for</strong> the date in the date range in 2019</li> 
    <li>Use a <strong>for</strong> loop for the .as_completed list of futures</li>
    <ul>
      <li>Append the .result() of each future in the list to the list that you declared at the top</li>
    </ul>
  </ol>
</details>

<br>

<details>
    <summary><font color = 'green'>SPOILERS! Click once for a redacted code block if you're really really really stuck.</font></summary>
    <div>
        <img src = 'https://uplevelsg.s3-ap-southeast-1.amazonaws.com/ProjectComputerVisionTraffic/ConcurrentFunctionCalls.png'>
    </div>
</details>

In [None]:
# Step 11a: Make concurrent API calls for 2019 data

In [None]:
# Step 11b: Get length of list of JSON results

### Step 12: Repeat Step 5-7 to get 2019 Jan DataFrame
Now that you have a long list of JSON objects, it's time to turn them into a DataFrame. 

Repeat what you did earlier and get a huge DataFrame containing traffic image URLs for 2019 Jan.

You should expect around <strong>30,000 rows<strong> after dropping duplicates and filtering for 1709 in the end.

In [None]:
# Step 12: Get the 2019 Jan DataFrame

### Step 13: Sort the DataFrame by timestamp
Using concurrency meant that the JSON in the list is not ordered chronologically.

As such, you'll have to sort the DataFrame by timestamp.

In [None]:
# Step 13: Sort the DataFrame by timestamp

### Step 14: Export the DataFrame into a CSV file
Once you're done with sorting, export your hard work into a CSV file in your Google Drive.

In [None]:
# Step 14: Export the DataFrame into CSV

### Step 15: Repeat Steps 11-14 for date range in 2020
Now that you're done with 2019, time to collect data for 2020 Jan.

Make the same concurrenct calls but with the 2020 date range that you set up earlier.

You might get a different number depending on the API's behaviour, but expect around 29,900 rows at the end after removing duplicate entries.

<font color='red'><strong>Allocate another hour for this Step.</strong></font>

In [None]:
# Step 15: Repeat Steps 11-14 for 2020 data

# End of Part II
What a long Part. In this Part, you successfully made multiple API calls and obtained two months' worth of data.  

Next Part, we will retrieve the images from the URL we find in the DataFrame for both years. We will also prepare for setting up Google Colab for object detection with GPU.

The next part is quite long as well so prepare yourself.