View Live
It was a pleasure and also quite a challenge to work on this project. Below I have listed the issues that arose in the order that they occurred, what I tried, and how I solved them. Further down is my project steps outline.
-
CSV
Problem: The CSV file was too large for GitHub.
Solution: I added it to my .gitignore file, and used
git reset --soft HEAD^
andgit reset HEAD <heatmap/.GeoLite2-City-CSV_20190312/.GeoLite2-City-Blocks-IPv4.csv
to go back to prior commits and remove the file from staging. -
HEROKU ERROR
Problem: I got an error code when trying to launch the Heroku site:
code=H10 desc="App crashed"
, and scrolling up the traceback I found:ModuleNotFoundError: No module named 'geodata-django'
. I did a project-wide search for 'geodata-django' and found that I had entered it in the Procfile asweb: gunicorn geodata-django.wsgi
.Tried:
-
I replaced 'geodata-django' with 'GeoData' but got the same message:
No module named 'geodata-django'
. Ultimately this step was partly the answer. -
I reviewed Heroku set-up and tried
$ heroku ps:scale web=1
to ensure that at least one instance of the app was running. I got a positive responseScaling dynos... done, now running web at 1:Free
. -
I restarted Heroku with
heroku restart
. -
I connected a psql session with my remote database:
$ heroku pg:psql
output (abbr.)-->Connecting to gresql-polished-87072 psql blooming-journey-52100::DATABASE=>
-
I tried writing to the Procfile using
echo "web: python app.py" > Procfile
in the command line. This was a cool trick that I'm glad I got to try, but unfortunately got the same result. (https://stackoverflow.com/questions/15790691/procfile-not-found-heroku-python-app) -
From (https://stackoverflow.com/questions/29481506/heroku-procfile-not-working) I tried
$ heroku run bash $ cat Procfile
output -->web: gunicorn geodata-django.wsgi
andno module named geodata-django
. I entered:$ web: gunicorn geodata.wsgi
(Progress! I got ano module named geodata
error.) I repeated these steps withweb: gunicorn GeoData.wsgi
(New error!ModuleNotFoundError: No module named 'GeoData.heroku_settings
) And that's right, there was no module named that at the time (it was temporarily commented out). -
I un-commented heroku_settings.py', pushed to git and to Heroku. I got an error while running
$ python manage.py collectstatic --noinput
. I tried adding a CSS file under 'static/'.
Solution: It needed to be:
web: gunicorn GeoData.wsgi
, but I had neglected to push to Heroku, and fix a few other things such as un-commenting heroku_settings.py and adding my requirements.txt.Lessons Learned:
I should make all apps, projects, and files lower case to reduce the chance of this type of error. I am now well-versed in the Heroku deployment. In the past I had done no more than one deployment per project; I did not have experience with doing it frequently. I was following a guide and didn't have the process memorized. But I do now!
-
-
DATAFRAME VALUE ERROR
Problem: In load_geodata.py,
print(df)
works, butreturn(df)
gets aValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Tried:
1. `df = df[['latitude', 'longitude']] df.head()` -- no errors `print(df.head())` -- outputs top 5 rows latitude longitude 0 -35.5016 138.7819 1 24.4798 118.0819 2 24.4798 118.0819 3 -33.4940 143.2104 4 23.1167 113.2500 2. `print(df.all())` -- outputs latitude False longitude False dtype: bool 3. `print(df.bool())` -- outputs `ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().` 4. `print(df.any())` -- outputs latitude True longitude True dtype: bool
Solution:
I consulted with Clinton from Momentum, who thought that pandas would be problematic for the future steps in this challenge. He suggested I use the simpler CSV Reader that I had previously rejected. Specifically, the DictReader. So I scrapped pandas. I spent many hours working with pandas in this project. Goodbye beautiful code! I'll never forget you!
In memoriam:
`import pandas as pd import io
def handle(self, *args, **kwargs): # Truncated_data test: # truncated_data = ''' # network,geoname_id,registered_country_geoname_id,represented_country_geoname_id,is_anonymous_proxy,is_satellite_provider,postal_code,latitude,longitude,accuracy_radius # 1.0.0.0/24,2070667,2077456,,0,0,5214,-35.5016,138.7819,100 # 1.0.1.0/24,1811017,1814991,,0,0,,24.4798,118.0819,50 # ''' # df = pd.read_csv(io.StringIO(truncated_data), usecols=["latitude", "longitude"])
# Parses from entire CSV file: df = pd.read_csv(("heatmap/.GeoLite2-City-CSV_20190312/.GeoLite2-City-Blocks-IPv4.csv"), usecols=["latitude", "longitude"]) df = df[['latitude', 'longitude']] df.head() # just top 5 rows`
-
SAVE LAT AND LONGS FROM CSV
Problem:
I needed to figure out how to get the latitude and longitude data saved through the LatLong Model, so it could then be serialized and written to a JSON (or GeoJSON) API endpoint.
Solution:
I used the CSV DictReader to isolate lat and long, and create model objects. Then used the ModelSerializer class to manipulate the serialization, and send it through the ListAPIView to return the GeoJSON data in the API endpoint.
-
DELETE LARGE NUMBER OF TEST OBJECTS
Problem:
109k+ model objects were created during testing, that needed to be deleted before creating from the full CSV file of 3 million + objects.
Solution:
I commented out the code in the handle management command function, and instead ran
LatLong.objects.all().delete()
. -
VALIDATION ERROR
Problem:
When running my management command to pull out latitudes and longitudes from the CSV and create objects, I got the following error:
.../python3.7/site-packages/django/db/models/fields/__init__.py", line 1559, in to_python params={'value': value}, django.core.exceptions.ValidationError: ["'' value must be a decimal number."]
Tried:
I looked through the CSV, and saw that some of the lat/long values were whole numbers. I reasearched to see if that could throw the error, and it didn't look like it should, because the decimal field of the LatLong model turns whole numbers into numbers with zeros after the decimal point. I revisited my choice to choose the decimal field over the float field. I decided that decimal field should be fine.
Solution:
To get around the error, I added this to my function:
try: LatLong.objects.create(latitude=row['latitude'], longitude=row['longitude']) except ValidationError: pass
-
ACCIDENTAL LARGE FILE PUSH
Problem:
I accidentally pushed my data.dump file, created to attempt to populate the database on Heroku with the database created locally, onto Git.
Tried:
I deleted the file from the project and from Git, but Git became frozen in a loop with the file stuck in limbo somehow.
Solution:
I followed the steps in the link below:
-
SERIALIZE MODEL OBJECTS INTO GEOJSON
Problem:
Even though hardcoded test points written in GeoJSON format rendered on the heatmap layer,and even though I wrote a serializer that rendered identical looking GeoJSON, the console showed an error that said it was not valid GeoJSON.
Solution:
I researched, tweaked, and repeatedly tested the map.on('load', function() in my map.js, the serializer, and the view (and the type of view), and ultimately what worked was the def list method added to the ListAPIView.
-
HEROKU DATABASE
Problem:
I followed the steps here to dump my local database into a file, commit to Git, and push onto Heroku, and then pull back from Git without pushing it (because it's way too large). I never got an error, but it didn't populate my Heroku database.
Tried:
I repeated the steps, and confirmed it still didn't work.
Solution:
I used the same process, but with the CSV file. And then on Heroku I used the management command to read the CSV, pull out the latitude and longitude data, and create objects.
Problem 2:
The command above ran for around half an hour, and about 10 minutes in Heroku emailed me that I had run over my 10,000 row limit, and that in 7 days they would have to revoke my "insert" privileges. When I stopped the command program because it was no longer adding to Heroku, I had used around 145,000 rows. I did notice that over time, that number was decreasing (around 139,000 one hour later).
Tried:
I considered paying for an upgrade plan, but that would involve recreating the database. Since it was the final hour, I didn't want to chance taking a working (but limited) product down and re-running it.
Solution:
The program is working, the heat layer loads, but it only has around 145,000 objects in my models, instead of over three million. In order to solve this problem, I would have needed to compress the data significantly somehow.
-
Set up Project
Create:
-
repo on GitHub
-
Django project
-
html templates
-
static files: CSS JavaScript (finish at the end)
-
urls
-
model(s), make migrations
-
admin class for each model
-
views for index page
-
-
Research how to access CSV file in Django
Write the python script
Test in Python shell
Create a management command to load CSV file
Test parsing lat/long data from sample (top three lines) of CSV
Parse data from entire IPv4 file
Decide between these two methods (initially chose pandas but later switched to CSV Reader) CSV module: (https://docs.python.org/3/library/csv.html) Parse specific columns from CSV file (https://stackoverflow.com/questions/16503560/read-specific-columns-from-a-csv-file-with-csv-module)
Write code to return list of coordinates that can be used for JSON
Create model objects (25 minutes to load!)
-
Research MapBox
Must be JSON from the API to satisfy assignment requirements
Research using MapBox, MapBox gl JS, Leaflet, and Leaflet-heat to draw geographical data on a map in the browser
Convert JSON to GeoJSON in this format:
`{ "type": "Feature", "geometry": { "type": "Point", "coordinates": [125.6, 10.1] } }`
Coordinates in this order: long, lat
Bind in geo bounding box using MapBox
-
Django REST Framework buildout
Research endpoint requirements for heatmap
Define a REST endpoint that returns a list of coordinates within a geographic coordinate bounding box
url for api in heatmap.urls
api app:
views serializers (https://www.django-rest-framework.org/api-guide/fields/#decimalfield) urls
-
Deploy to Heroku
Create requirements.txt for dependencies
Install Heroku
Finishing Steps:
1. Follow steps from (https://jaketrent.com/post/django-loaddata-heroku/) to dump data from database onto Heroku (bypasses Git in a way). Remove .json file from project.
2. Serialize model objects to output GeoJSON; store them in an api endpoint
serializers.py
api/views.py
api/urls.py
3. Push to Heroku
4. Add api endpoint to map.addSource; check that it adds points on heat layer of map.
5. Check that requirements.txt is up to date
6. Push to Git
7. Push to heroku, limiting queryset to 1000 so the server connection will not time out. Continue to raise queryset and test
8. Submit