# CharRNN - Cheesy Pick-Up Lines Generator
A _cheesy pick-up line generator_ which might just increase your chances of getting a tinder date?

### 1. Data Cleaning
I collected the data earlier with a web-scraper, but I had to filter more that half of the pick-up lines out by myself and modify them since they were way too direct, off-putting, offensive, or just not gender-inclusive. This is a step I am taking to optimize this mini dataset. 

In [None]:
# imports
import numpy as np
import pandas as pd
import re

In [None]:
data = pd.read_csv('pick-up-lines.txt', sep="\n", header=None)

In [None]:
data.rename(columns = {0: 'pick up lines'}, inplace=True)

In [None]:
def cleanText(data, txt, clean_txt):
    
    # convert all text to lowercase
    data[clean_txt] = data[txt].str.lower()
    
    # remove all special characters
    data[clean_txt] = data[clean_txt].apply(lambda elem: re.sub(r"(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)|^rt|http.+?", "", elem))  
    
    # remove all numbers
    data[clean_txt] = data[clean_txt].apply(lambda elem: re.sub(r"\d+", "", elem))
    
    return data

In [None]:
data_clean = cleanText(data, 'pick up lines', 'cleaned data')

In [None]:
data_clean.head()

Unnamed: 0,pick up lines,cleaned data
0,Can I have your picture so I can show Santa wh...,can i have your picture so i can show santa wh...
1,Are you Google? Because I've just found what I...,are you google because ive just found what ive...
2,If you stood in front of a mirror and held up ...,if you stood in front of a mirror and held up ...
3,Your hand looks heavy. Let me hold it for you.,your hand looks heavy let me hold it for you
4,I'm learning about important dates in history....,im learning about important dates in history w...


In [None]:
!pip install nltk --quiet --upgrade

[K     |████████████████████████████████| 1.4MB 2.9MB/s 
[?25h  Building wheel for nltk (setup.py) ... [?25l[?25hdone


In [None]:
import nltk.corpus
nltk.download('stopwords') 

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


True

In [None]:
from nltk.corpus import stopwords

In [None]:
# remove all stopwords
stop = stopwords.words('english')
data_clean['cleaned data'] = data_clean['cleaned data'].apply(lambda x: ' '.join([word for word in x.split() if word not in (stop)]))
data_clean.head()

# dataset kinda doesn't make sense after this at all tbh???

Unnamed: 0,pick up lines,cleaned data
0,Can I have your picture so I can show Santa wh...,picture show santa want christmas
1,Are you Google? Because I've just found what I...,google ive found ive searching
2,If you stood in front of a mirror and held up ...,stood front mirror held roses would see beauti...
3,Your hand looks heavy. Let me hold it for you.,hand looks heavy let hold
4,I'm learning about important dates in history....,im learning important dates history wanna one


### 2. Append Data to Input File

In [None]:
# append cleaned data to input.txt
data_clean['cleaned data'].to_csv('input.txt', header=None, index=None, sep='\n', mode='a')

In [None]:
# size of input data is still very small but it's about as large as i could make it :/
! du -h input.txt

264K	input.txt


### 3. Train the Data

- [x] [Training CharRNN for ml5.js](https://github.com/ml5js/training-charRNN)

In [14]:
!git clone https://github.com/ml5js/training-charRNN.git --quiet

In [None]:
%%shell

cd training-charRNN
python3 -m pip install tensorflow==1.15.0
bash run.sh