# Introduction
<font color='orange'>[Google Colab]</font> In Part II, we collected data for January in 2019 and 2020.
 
What we ended up with were two CSVs containing the traffic image URLs. We'll need to retrieve the images from the URL. 
 
In this Part, we will:
1. Load the CSVs again into DataFrames
2. Write a function to download images into your drive
3. Execute the function concurrently
4. Prepare for OpenCV GPU execution in Part IV
 
<font color="red"><strong>Allocate 4 hours for this Part.</strong></font>

# Test image collection
In this section, we will loop through the URLs in the 'image' column in the DataFrame, and download the images first.

 ### Step 1: Import libraries
First, let's import a few libraries to retrieve the images.
- pandas as pd
- os
- requests
- BytesIO from io
- Image from PIL

In [None]:
# Step 1: Import the libraries

### Step 2: Load both CSVs from Part II
Mount your Drive and load the two CSVs from Part II.

The DataFrames should have around 29,990 to 30,000 rows, with 8 columns.

In [None]:
# Step 2a: Read 2019 Jan CSV into a DataFrame

In [None]:
# Step 2b: Read 2020 Jan CSV into a DataFrame

### Step 3: Create folders in your Drive
You'll need to create new folders in your Google Drive, in the same folder:
1. car_image_2019_Jan
2. car_image_2020_Jan

These two folders will contain the images that you will retrieve from the traffic image URLs.

Your eventual folder structure should look something like this:

```
Google Drive folder (give it a name)
│   Project CV x Traffic (Part I).ipynb
│   Project CV x Traffic (Part II).ipynb
│   Project CV x Traffic (Part III).ipynb   
│   Project CV x Traffic (Part IV).ipynb    
│   Project CV x Traffic (Part V).ipynb    
│
└───master-plan-2014-planning-area-boundary-no-sea-shp
└───car_image_2019_Jan
└───car_image_2020_Jan
```

### Step 4: Write function getImages
Since we're using concurrency to retrieve images, we'll first write a function called getImages which takes in three arguments:
1. index
2. row
3. destination_path

The reason why we're doing this is because we'll be using an <strong>.iterrows method on the DataFrame later</strong> to get both the index of the row containing the image URL, and the row itself. 

There are many ways to do this so we'll leave you to do it, as long as you end up saving a JPG image from the correct row.

Make sure you put in a try-except block in your function because the last thing you want is an error during the GET request for the image and you've no way of handling the error.

P.S. Remember the BytesIO and Image that you imported earlier? We'll be using it here.


<details>
  <summary>Click here once if you're unsure and need pseudocode</summary>
  <ol>
    <li><strong>Define</strong> getImages that takes in three arguments - (<font color='red'>index</font>, <font color='green'>row</font>, <font color='blue'>destination_url</font>)</li>
    <li>Declare a variable row_num that takes the current value of the <font color='red'>index</font>, the index of the current row </li>
    <li>Declare a variable temp_url that takes the current value of the <font color='green'>row</font>'s 'image' column</li>
    <li>Declare a variable temp_res that is the response object of the GET request from the temp_url</li>
    <li>Start a try/except block, where you first try to</li>
      <ul>
        <li>Declare a variable that contains the Image object containing the BytesIO object containing the .content of the temp_res response object</li> 
        <li>.save the variable in your <font color='blue'>destination_url</font> folder, with the row_num as image filename. For example, the image from index 0 should be named as 0.jpg</li> 
        <li><a href='https://www.pythonanywhere.com/forums/topic/13795/'>Example reference</a> for save step.</li>
      </ul>
    <li>If an error occurs, just pass</li>
  </ol>
</details>

In [None]:
# Step 4: Write getImages function

### Step 5: Test with first five rows of 2019 Jan
To see if you got the function right, let's run it through the first five rows of your 2019 Jan DataFrame.

Use a for loop with a .iterrows of your first five rows, and use the index, row with the getImages function.

If you do it right, you'll see something like this:

```
Google Drive folder (give it a name)
│   Project CV x Traffic (Part I).ipynb
│   Project CV x Traffic (Part II).ipynb
│   Project CV x Traffic (Part III).ipynb   
│   Project CV x Traffic (Part IV).ipynb    
│   Project CV x Traffic (Part V).ipynb    
│
└───master-plan-2014-planning-area-boundary-no-sea-shp
└───car_image_2019_Jan
|   └───0.jpg
|   └───1.jpg
|   └───2.jpg
|   └───3.jpg
|   └───4.jpg
|
└───car_image_2020_Jan
```
And your 0.jpg should look like this:

![2019JanExample](https://uplevelsg.s3-ap-southeast-1.amazonaws.com/ProjectComputerVisionTraffic/2019JanExample.png)

In [None]:
# Step 5: Save first five images with getImages

# Full image collection

### Step 6: Run a concurrent getImages call for 2019
If you've successfully called getImages on the first five rows of your 2019 DataFrame, it's time to get all of your images.

Construct a concurrent process, similar to Part II Step 4. 

You'll need 150 max workers, using a ThreadPoolExecutor. Make sure you .submit four things as an argument (in order):
1. getImages
2. index
3. row
4. destination PATH in GDrive

Check the car_image_2019_Jan from time to time to see if images are being added in.

<font color='red'><strong>Allocate 20-25 minutes for this task.</strong></font>

In [None]:
# Step 6: Run a concurrent getImages call for 2019

### Step 7: Count the number of images in car_image_2019_Jan
Now that you're done retrieving the images and saved them in the car_image_2019_Jan folder, check how many images you've saved.

There are a few ways to do this, but the listdir method from the os library works. 

Make sure the number of images in your folder and the length of your 2019 DataFrame are similar. It's ok to be off by a few images since there might be errors in the API.

Don't forget to manually check that your images are ok in the folder as well.

In [None]:
# Step 7: Count the number of images in car_image_2019_Jan

### Step 8: Repeat Steps 6-7 for car_image_2020_Jan
Once you successfully carried out Steps 6-7 for the 2019 data, do the same for 2020. 



In [None]:
# Step 8: Retrieve all images for 2020 Jan

# Prepare for OpenCV in GPU mode in Colab
If you're working on this project series on Google Colab, there are some things you need to prepare first before you can run GPU mode for OpenCV - a popular computer vision library.

We'll be using a very useful reference from https://towardsdatascience.com/how-to-use-opencv-with-gpu-on-colab-25594379945f


### Step 9: Run the first cell as specified by the author
Copy exactly what the author told you to copy and run, i.e. the code block immediately after "First, run this cell:"

This will take quite a while. <font color='red'><strong>Allocate 1.5 hours for this.</strong></font>

In [None]:
# Step 9: Run the first cell as specified by the author

### Step 10: Import and check the version of OpenCV
The author tells you to check the version of OpenCV after installing it.

Import cv2 and then check its version.

In [None]:
# Step 10: Import and check OpenCV version

### Step 11: Save the results of Step 9 to your own GDrive
The results of Step 9 isn't permanent, and you'll have to repeat this again if you don't save it. 

Scroll down the article a bit, and the author tells you to save the result of Step 9 to your own Drive.

The PATH is slightly wrong, so make sure you fix it, i.e. /gdrive/ to /drive/ if your PATH is named like that. 

Otherwise you'll face an error.

In [None]:
# Step 11: Save the new OpenCV into your own GDrive

### Step 12: Copy the OpenCV library into your working directory
You'll have to copy the library into your working directory as well (this folder).

Run the final code cell provided by the author.

```
Google Drive folder (give it a name)
│   Project CV x Traffic (Part I).ipynb
│   Project CV x Traffic (Part II).ipynb
│   Project CV x Traffic (Part III).ipynb   
│   Project CV x Traffic (Part IV).ipynb    
│   Project CV x Traffic (Part V).ipynb    
│   cv2.cpython-36m-x86_64-linux-gnu.so
│
└───master-plan-2014-planning-area-boundary-no-sea-shp
└───car_image_2019_Jan
└───car_image_2020_Jan
```
Your folder should look something like this after running the final code cell.


In [None]:
# Step 12: Move the new OpenCV into your current folder

# End of Part III
What a long Part. 

In this Part, you successfully retrieved all images that the project needs.

On top of that, you've also successfully upgraded your OpenCV version so that you can work with the GPU.

In the next Part, you will finally get down to it and count the number of vehicles on roads in your images that you have collected.