# API Chat Database using Mongo, Flask, and Python

In [443]:
import requests
from bs4 import BeautifulSoup
import re
from flask_pymongo import PyMongo
from pymongo import MongoClient
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import pandas as pd



### We have made our connection with Mongo, Flask and Python in Visual Studio Code.
- And we are able to insert into and request data from Mongo.<br>
- Now we are going to try to get track down something we can use to populate our chat databae with.
 

### We could use a real chat app like slack...<br> but we could have some fun with it and fill our database with dialogues from a script.<br>
Some options were:
- Monty Python's Life of Brian (from the namesake of our beloved code)
- The Big Lebowski (NLTK would go crazy with all the f-bombs and Walter would definitley tilt the scale 
  to the negative.)
- Twin Peaks (a personal favorite)


-------------

### We found a relatively easy script to parse of Twin Peaks. It doesn't seem to be a heavy work load of scraping and cleaning.

In [400]:
data = requests.get("http://www.lynchnet.com/tp/tp01.html")
data

<Response [200]>

(The following steps are for my benefit to help me see the types of data each turn of the screw produces. We can see that the data transforms from a Response class to BeautifulSoup to a ResultSet class to a Tag class and finally to a list)

---------

In [401]:
type(data)

requests.models.Response

In [402]:
soup = BeautifulSoup(data.text)
type(soup)

bs4.BeautifulSoup

In [403]:
script = soup.select('pre')
type(script)


bs4.element.ResultSet

In [404]:
scr = script[0]
type(scr)

bs4.element.Tag

In [405]:
s = scr.contents
type(s)

list

--------------

### Now we have a list and we can start cleaning the contents. <br> There are lots of \n characters and it looks like a great opportunity to use our Regular Expressions.

In [406]:
clean_script = [re.sub(r"[\n\tb]","",str(phrase)) for phrase in s]

cl_scr = [re.sub(r"[<.*>]","", str(phrase)) for phrase in clean_script]
cl_scr
del cl_scr[:8]
cl_scr[0]

'FADE IN:    1 EXT GREAT NORTHERN HOTEL - DAY    Dawn reaks over the Great NorthernCUT TO:    2 INT GREAT NORTHERN HOTEL ROOM - DAY    We hear him efore we see him, ut DALE COOPER is perched six inches aovethe floor in a one-handed yoga "frog" position, wearing oxer shorts and a pair ofsocks, talking into the tape recorder which is sitting on the carpet near his head    COOPERDiane  6:18 am, room 315, Great NorthernHotel up here in Twin Peaks Slept pretty wellNon-smoking room No toacco smell That\'s anice consideration for the usiness traveller Ahint of douglas fir needles in the air As SheriffTruman indicated they would, everything thishotel promised, they\'ve delivered: clean,reasonaly priced accomodations  telephoneworks  athroom in really tip-top shape  nodrips, plenty of hot water with good, steadypressure  could e a side-enefit of thewaterfall just outside my window  firmmattress, ut not too firm  and no lumps likethat time I told you aout down in El Paso Diane, what a nightmare 

-----------------

### By a lucky turn of events we found a very clean Big Lebowski subtitle text file. So we are changing to this because we know that the NLTK analysis will give us a clear negative sentiment probability. 

In [197]:
reader = open("code/helpers/the_big_lebowski.txt", 'r') 
lines = [line for line in reader]
for line in lines:
    if line == "\n":
        lines.remove(line)
clean_lines = [re.sub(r"[\n]","",line) for line in lines]
clean_lines
len(clean_lines)

3203

In [424]:
character_list = ["Dude Lebowski", "Walter Sobchak","Donny", "Jeffrey Lebowski", "Maude Lebowski", "Bunny Lebowski", "Brandt", "Stranger", "Marty", "DaFino", "Jackie Treehorn", "Nihilists", "Uli Kunkel", "Karl Hungus", "Franz", "Dieter", "Jesus Quintana", "Liam O'Brien", "ArthurDigby Sellers", "Larry Sellers", "Smokey", "Knox Harrington"]


In [425]:
print(character_list)

['Dude Lebowski', 'Walter Sobchak', 'Donny', 'Jeffrey Lebowski', 'Maude Lebowski', 'Bunny Lebowski', 'Brandt', 'Stranger', 'Marty', 'DaFino', 'Jackie Treehorn', 'Nihilists', 'Uli Kunkel', 'Karl Hungus', 'Franz', 'Dieter', 'Jesus Quintana', "Liam O'Brien", 'ArthurDigby Sellers', 'Larry Sellers', 'Smokey', 'Knox Harrington']


### We are going to try to insert user data into our database using this list of characters.

In [431]:
url = "http://localhost:5000"

In [432]:
res = requests.get(url)
res

<Response [200]>

In [434]:
# This will insert our users
for name in character_list:
    new_user = {
        "name":name
    }
    requests.get(url+"/insert/users", params=new_user)
    

In [420]:
# We magically created a list of list.
# each list in a scene from the movie which we will use as a chat. 
len(all_scenes)

27

In [435]:
# This will insert our chats into the chat collection and messages into the message collection
for scene in all_scenes:
    dialogue = {
            "chat":scene
    }
    requests.get(url+"/insert/chats", params=dialogue)
    for line in scene:
        message = {
            "message":line
        }
        requests.get(url+"/insert/messages", params=message)

### Not everything works the way we want it.
- Tried to retrieve infomation from our database but having difficulty walking and chewing gum at the same time.
- At least we can still have a little fun and analyze the probability of sentiments in our sample dialogue.
- Our prediction is it will be overwhelmingly negative because there are 260 f-bombs and a lot of complaining from both Walter and the Dude.

In [444]:
sia = SentimentIntensityAnalyzer()

In [449]:
punc = pd.DataFrame([sia.polarity_scores(line) for scene in all_scenes for line in scene])
punc.head()

Unnamed: 0,neg,neu,pos,compound
0,0.0,1.0,0.0,0.0
1,0.0,1.0,0.0,0.0
2,0.0,1.0,0.0,0.0
3,0.0,1.0,0.0,0.0
4,0.0,1.0,0.0,0.0


In [448]:
punc.mean()

neg         0.083262
neu         0.826345
pos         0.090391
compound   -0.007196
dtype: float64

### Hey, look at that! It's possibily only a bit negative! We could try this on scene by scene basis. 
- but we will stop here for the time....don't want to take the magic out of everything all at one.