# Read and POST

This notebook reads the dataset from a file and sends HTTP POST requests to the endpoint API. Code is based on the `2a-fake-website.ipynb` notebook.

In [1]:
%%bash
python3 -m pip install kafka-python



In [6]:
# Initialize Kafka Topics
from kafka.admin import KafkaAdminClient, NewTopic
from kafka.errors import TopicAlreadyExistsError

TOPICS = ["ingest", "ingest-test", "ingest-cleaned-oef2", "ingest-cleaned"]

admin_client = KafkaAdminClient(bootstrap_servers="localhost:9092")
topic_list = []
for topicname in TOPICS:
    print("Creating topic: {}".format(topicname))
    topic_list.append(NewTopic(name=topicname, num_partitions=1, replication_factor=1))
try:
    admin_client.create_topics(new_topics=topic_list, validate_only=False)
except TopicAlreadyExistsError:
    pass

Creating topic: ingest
Creating topic: ingest-test
Creating topic: ingest-cleaned-oef2
Creating topic: ingest-cleaned


In [7]:
import pandas as pd
from datetime import datetime

df = pd.read_csv("../911.csv")
df.sort_values(by="timeStamp") # sort the rows so it seems like the data is streaming into our API in realtime

Unnamed: 0,lat,lng,desc,zip,title,timeStamp,twp,addr,e
2,40.121182,-75.351975,HAWS AVE; NORRISTOWN; 2015-12-10 @ 14:39:21-St...,19401.0,Fire: GAS-ODOR/LEAK,2015-12-10 14:39:21,NORRISTOWN,HAWS AVE,1
5,40.253473,-75.283245,CANNON AVE & W 9TH ST; LANSDALE; Station 345;...,19446.0,EMS: HEAD INJURY,2015-12-10 15:39:04,LANSDALE,CANNON AVE & W 9TH ST,1
7,40.217286,-75.405182,COLLEGEVILLE RD & LYWISKI RD; SKIPPACK; Stati...,19426.0,EMS: RESPIRATORY EMERGENCY,2015-12-10 16:17:05,SKIPPACK,COLLEGEVILLE RD & LYWISKI RD,1
11,40.084161,-75.308386,BROOK RD & COLWELL LN; PLYMOUTH; 2015-12-10 @ ...,19428.0,Traffic: VEHICLE ACCIDENT -,2015-12-10 16:32:10,PLYMOUTH,BROOK RD & COLWELL LN,1
6,40.182111,-75.127795,LAUREL AVE & OAKDALE AVE; HORSHAM; Station 35...,19044.0,EMS: NAUSEA/VOMITING,2015-12-10 16:46:48,HORSHAM,LAUREL AVE & OAKDALE AVE,1
...,...,...,...,...,...,...,...,...,...
663517,40.157956,-75.348060,SUNSET AVE & WOODLAND AVE; EAST NORRITON; 2020...,19403.0,Traffic: VEHICLE ACCIDENT -,2020-07-29 15:46:51,EAST NORRITON,SUNSET AVE & WOODLAND AVE,1
663518,40.136306,-75.428697,EAGLEVILLE RD & BUNTING CIR; LOWER PROVIDENCE...,19403.0,EMS: GENERAL WEAKNESS,2020-07-29 15:52:19,LOWER PROVIDENCE,EAGLEVILLE RD & BUNTING CIR,1
663521,40.015046,-75.299674,HAVERFORD STATION RD & W MONTGOMERY AVE; LOWER...,19041.0,Traffic: VEHICLE ACCIDENT -,2020-07-29 15:52:46,LOWER MERION,HAVERFORD STATION RD & W MONTGOMERY AVE,1
663519,40.013779,-75.300835,HAVERFORD STATION RD; LOWER MERION; Station 3...,19041.0,EMS: VEHICLE ACCIDENT,2020-07-29 15:52:52,LOWER MERION,HAVERFORD STATION RD,1


In [8]:
import random
import time
import requests
from IPython.display import clear_output

# Send rating data to API
# Using the same seed will result in the same "random" sequence
random.seed("GDV2020") 
for index, row in df.head(10000).iterrows(): # change this to `df.head(n).iterrows()` to only read first n entries
    time.sleep(random.randint(0, 10)/1000)
    #print(row[["lat", "lng", "desc", "zip", "title", "timeStamp", "twp", "addr", "e"]].to_dict())
    requests.post(
        'http://localhost:5000/nineoneone',
        json=row[["lat", "lng", "desc", "zip", "title", "timeStamp", "twp", "addr", "e"]].to_dict(),
    )
    if index % 5 == 0:
        clear_output()

clear_output()
print("All rows sent to API.")

All rows sent to API.


When you're finished with this notebook, _turn off the endpoint_ you just created and proceed open `3_cleanup.ipynb` to create a cleanup pipeline using Spark Structured Streaming.