# Lesson 17 - Preparing Data

---

![Data Prep](https://github.com/CodeYourDreams/Develop_Curriculum/blob/master/Mapping%20Applications%20with%20Flask/Images/file.png?raw=true "segment")

In the last lecture, we reviewed some of the common mapping tools and everyone was tasked with determining which mapping tool was best for their application.

__But what's a completed map, without data???__

When using any of the mapping softwares mentioned, you'll be dealing with geometric data, specifically data with shape geometry, and latitude and longitude values.

However, not all data comes with geometeric values. 

---

## CSV <br>![CSV](https://www.fundrecs.com/img/CSV.png)

A CSV, or a comma-seperated values, file is a plain file type that is used to store tabular data such as a database or spreadsheet. CSV files are very common; they can be created and exported from services like Microsoft Excel and Google Sheets.


Let's take a look at one CSV file from the City of Chicago Database Library. <br><br>__Use pandas to open this dataset:__ <br>https://data.cityofchicago.org/api/views/j8a4-a59k/rows.json?accessType=DOWNLOAD

In [1]:
#conda install pandas

In [2]:
import pandas
data = pandas.read_csv('Chicago_Restaurants_Data.csv')
print(data.head()) # the .head() command is used to display only the first 5 rows of data

   Inspection ID                        DBA Name                   AKA Name  \
0        2300607  77 W WACKER DR HOSPITALITY LLC                    CLUB 77   
1        2300609                  TOTTO'S MARKET             TOTTO'S MARKET   
2        2300603                LA FIESTA BAKERY  LA FIESTA BAKERY/TAQUERIA   
3        2300586                 THE BEER TEMPLE            THE BEER TEMPLE   
4        2300589                 THE BEER TEMPLE            THE BEER TEMPLE   

   License #  Facility Type           Risk                  Address     City  \
0  2658215.0     Restaurant   Risk 3 (Low)          77 W WACKER DR   CHICAGO   
1  2637113.0  Grocery Store   Risk 3 (Low)       751 S DEARBORN ST   CHICAGO   
2  1488177.0     Restaurant  Risk 1 (High)       6424 S PULASKI RD   CHICAGO   
3  2670921.0   TAVERN/STORE   Risk 3 (Low)  3169-3175 N ELSTON AVE   CHICAGO   
4  2670919.0   TAVERN/STORE   Risk 3 (Low)  3169-3175 N ELSTON AVE   CHICAGO   

  State      Zip Inspection Date Inspection 

This dataset is derived from food inspections from various restaurants and food distributors in Chicago.

The key piece of information needed for any data dealing with mapping is a longitude and latitude column. As you can see in the last two columns, we have the longitude and latitude of each restaurant in the dataset. 

---

## JSON <br>![JSON](https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRTSdZjBgpqTiu5p0In6sbJDl_bvjX5BwDCSvhiEKJ3nmgwc_-k)

In order to make these values usable in out maps, we must convert our CSV file into a JSON file. JSON, or JavaScript Object Notation, is a way to store files in an easily readable way. JSON data is generally stored as several dictionaries in an arrray.

For instance, this is an example of JSON Data:

In [12]:
chicago = [{
    "name" : "Loop",
    "direction" : "Central"
},
{
    "name" : "Austin",
    "direction" : "West"
},
{
    "name" : "Lincoln Park",
    "direction" : "North"
}]

Let's take a look at this video that breaks down how to convert a CSV file to a JSON file:<br><br>https://www.youtube.com/watch?v=La6ZO8vu-1w (stop at 5:40)

### Breaking down the steps

![Instructions](https://i.gifer.com/ZCLZ.gif)

1. Import csv and json packages
        import csv 
        import json 
<br>
2. Create a file path for your CSV file 
       csvFilePath = "file.csv"
<br>       
3. Create a file path for your new JSON file 
       jsonFilePath = "newfile.json"
<br>       
4. Create an empty dictionary to store your JSON Data 
       data = {}
<br>
5. Open and read the CSV file: 
        with open(csvFilePath) as csvFile:
                csvReader = csv.DictReader(csvFile)
                for csvRow in csvReader:
                        id = csvRow['id']  <- look at this line for exercise 1
                        data[id] = csvRow
            
       



6. Using the file path from step 3, create a JSON file to write the converted data on:
        with open(jsonFilePath, 'w') as jsonFile:
        jsonFile.write(json.dumps(data, indent = 4))

----

### Exercise 1

Using what you have learned from the video, determine what variable name should be referenced to gain access to each row in the Chicago_Restaurants_Data.csv file. <br><br> _Hint: Open the file in a text editor to see what column name appears first_ 

In [61]:
# id = csvRow['Your answer here']

---

### Exercise 2

Building off the id you found in exercise 1, convert the Chicago Restaurants CSV file to a JSON file name Chicago_Restaurants.json.

If you ran the commands correctly, a Chicago_Restaurants.json file should be created within your current directory.

----

## GeoJson

Another file type that you will likely encounter while creating maps is GeoJson. GeoJson is a format that encodes for a variety of geographical structures. 

For instance:
- Point ⚫
- LineString _〰_
- Polygon 🔶
- MultiPoint
- MultiLineString
- MultiPolygon

A GeoJSON file still ends with a .json tag, yet, the structure of the dataset inside the file sets a GeoJSON apart from a regular JSON file:

In [14]:
{
  "type": "Feature",
  "geometry": {
    "type": "Point", # the geometry type
    "coordinates": [41.8781, -87.6298] # [longitude, latitude]
  },
  "properties": {
    "name": "Chicago" # other aspects of the data
  }
}

{'type': 'Feature',
 'geometry': {'type': 'Point', 'coordinates': [41.8781, -87.6298]},
 'properties': {'name': 'Chicago'}}

Now let's convert a CSV file to a GeoJSON file:

1. For this conversion, we not only need the libraries we inputted earlier, but we also need somes features from the geojson library:

In [22]:
import csv, json
from geojson import Feature, FeatureCollection, Point

2. Next, create an empty list to hold the new GeoJSON data

In [23]:
features = []

3. Open the csv file and read in each row

In [38]:
with open('Chicago_Restaurants_Data.csv', newline='') as csvfile:
    reader = csv.reader(csvfile, delimiter=',')
    for  row in reader:
        print('This is what "row" gives back ') # This part is not included in the final function, only used to show what row gives back
        print(row) # This part is not included in the final function, only used to show what row gives back
        break

This is what "row" gives back 
['Inspection ID', 'DBA Name', 'AKA Name', 'License #', 'Facility Type', 'Risk', 'Address', 'City', 'State', 'Zip', 'Inspection Date', 'Inspection Type', 'Results', 'Latitude', 'Longitude']


4. Index the latitude and longitude values from each row, in addition to any other values you want in the properties section. Include them in the following outline:

In [42]:
with open('Chicago_Restaurants_Data.csv', newline='') as csvfile:
    reader = csv.reader(csvfile, delimiter=',')
    for  row in reader:
        latitude, longitude = map(float, (latitude, longitude)) # latitude should equal row[-2] -> latitude, longitude = map(float, (row[-2], longitude))
        features.append(
            Feature(
                geometry = Point((longitude, latitude)),                 
                properties = {
                    'name': name, # name should equal row[1] -> 'name' : row[1]
                    'risk': risk,
                  'results' : results
                 }
             )
       )


NameError: name 'latitude' is not defined

---

![Error](https://www.publicdomainpictures.net/pictures/250000/nahled/erorr.jpg)

-----

### Exercise 3

Edit the above code with the appropriate index values in the variable row. Use the comments in the code to get you started. <br> The code should run error free if it's written correctly.

---

5. Lastly, create a JSON file and write the data to it:

In [49]:
collection = FeatureCollection(features)
with open("GeoObs.json", "w") as f:
    f.write('%s' % collection)

<br>

#### Now no matter what mapping tool you're using, you can convert your data to the necessary file format to map your data!

---

## Homework

Write out the steps to prepare the data for your app 