# OKCupid Project

In recent years, there has been a massive rise in the usage of dating apps to find love. Many of these apps use sophisticated data science techniques to recommend possible matches to users and to optimize user experience. These apps give us access to a wealth of information that we've never had before about how different people experience romance. In this project, data from OKCupid, an app that focuses on using multiple choice and short answers to match users, will be analysed. 

## The Dataset

The OKCupid Dataset contains the following features:

* `body_type`: multiple choice question with 12 possible answers: average, fit, athletic, thin, curvy, a litle extra, skinny, full figured, overweight, jacked, used up, and rather not say.
* `diet`: Icluding variations of the options: anything, vegetarian, vegan, kosher, and halal
* `drinks` : Abaut the person's drinking habits. Options are: desperately, very often, often, socially, rarely, and not at all.
* `drugs` : The person's drug use, with the options never, sometimes, or often. 
* `Education`: The education level of the person, with 32 different options.
* `essay 0` to `essay9`: Open short answers.
* `ethnicity`. The ethnicity of the person.
* `height`
* `Income`: How much the person makes a year.
* `job`: 21 possible answers.
* `location`: where the person lives.
* `offspring`: whether they have any offspring and whether they might want in the future (15 possible answers).
* `orientation`: Sexual orientation. Options are: straight, gay, or bisexual.
* `pets`: whether they like or dislike either dogs, cats or both and whether they have them as pets. 15 possible answers.
* `religion`: The religion the person follows an how important it is for them. 
* `sex`: Male or Female.
* `sign`: The horoscope sign and the importance the person places on horoscope. 48 possible answers.
* `smokes`" the smoking habbit of the person. Options are: yes, sometimes, when drinking, trying to quit, and no.
* `speaks`: the language/s the person speaks and the level (either poorly, okay, or fluently).
* `status`: the marital status, including the options: single, available, seeing someone, married, unknown.

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


In [7]:
data = pd.read_csv('profiles.csv')
data.head()

Unnamed: 0,age,body_type,diet,drinks,drugs,education,essay0,essay1,essay2,essay3,...,location,offspring,orientation,pets,religion,sex,sign,smokes,speaks,status
0,22,a little extra,strictly anything,socially,never,working on college/university,about me:<br />\n<br />\ni would love to think...,currently working as an international agent fo...,making people laugh.<br />\nranting about a go...,"the way i look. i am a six foot half asian, ha...",...,"south san francisco, california","doesn&rsquo;t have kids, but might want them",straight,likes dogs and likes cats,agnosticism and very serious about it,m,gemini,sometimes,english,single
1,35,average,mostly other,often,sometimes,working on space camp,i am a chef: this is what that means.<br />\n1...,dedicating everyday to being an unbelievable b...,being silly. having ridiculous amonts of fun w...,,...,"oakland, california","doesn&rsquo;t have kids, but might want them",straight,likes dogs and likes cats,agnosticism but not too serious about it,m,cancer,no,"english (fluently), spanish (poorly), french (...",single
2,38,thin,anything,socially,,graduated from masters program,"i'm not ashamed of much, but writing public te...","i make nerdy software for musicians, artists, ...",improvising in different contexts. alternating...,my large jaw and large glasses are the physica...,...,"san francisco, california",,straight,has cats,,m,pisces but it doesn&rsquo;t matter,no,"english, french, c++",available
3,23,thin,vegetarian,socially,,working on college/university,i work in a library and go to school. . .,reading things written by old dead people,playing synthesizers and organizing books acco...,socially awkward but i do my best,...,"berkeley, california",doesn&rsquo;t want kids,straight,likes cats,,m,pisces,no,"english, german (poorly)",single
4,29,athletic,,socially,never,graduated from college/university,hey how's it going? currently vague on the pro...,work work work work + play,creating imagery to look at:<br />\nhttp://bag...,i smile a lot and my inquisitive nature,...,"san francisco, california",,straight,likes dogs and likes cats,,m,aquarius,no,english,single


In [53]:
# Create a subset of the dataset with the multiple choice questions only. 
mc = ['body_type', 'diet', 'drinks', 'drugs', 'education', 'ethnicity', 'height', 'income', 'job', 'offspring', 'orientation', 'pets', 'religion', 'sex', 'sign','smokes','speaks', 'status']
mc_data = data[mc]
mc_data.head()

Unnamed: 0,body_type,diet,drinks,drugs,education,ethnicity,height,income,job,offspring,orientation,pets,religion,sex,sign,smokes,speaks,status
0,a little extra,strictly anything,socially,never,working on college/university,"asian, white",75.0,-1,transportation,"doesn&rsquo;t have kids, but might want them",straight,likes dogs and likes cats,agnosticism and very serious about it,m,gemini,sometimes,english,single
1,average,mostly other,often,sometimes,working on space camp,white,70.0,80000,hospitality / travel,"doesn&rsquo;t have kids, but might want them",straight,likes dogs and likes cats,agnosticism but not too serious about it,m,cancer,no,"english (fluently), spanish (poorly), french (...",single
2,thin,anything,socially,,graduated from masters program,,68.0,-1,,,straight,has cats,,m,pisces but it doesn&rsquo;t matter,no,"english, french, c++",available
3,thin,vegetarian,socially,,working on college/university,white,71.0,20000,student,doesn&rsquo;t want kids,straight,likes cats,,m,pisces,no,"english, german (poorly)",single
4,athletic,,socially,never,graduated from college/university,"asian, black, other",66.0,-1,artistic / musical / writer,,straight,likes dogs and likes cats,,m,aquarius,no,english,single


In [57]:
# Create a subset of the dataset with the open short questions
oq = ['essay0', 'essay1', 'essay2', 'essay3', 'essay4', 'essay5', 'essay6', 'essay7', 'essay8', 'essay9']
oq_data = data[oq]
oq_data.head()

Unnamed: 0,essay0,essay1,essay2,essay3,essay4,essay5,essay6,essay7,essay8,essay9
0,about me:<br />\n<br />\ni would love to think...,currently working as an international agent fo...,making people laugh.<br />\nranting about a go...,"the way i look. i am a six foot half asian, ha...","books:<br />\nabsurdistan, the republic, of mi...",food.<br />\nwater.<br />\ncell phone.<br />\n...,duality and humorous things,trying to find someone to hang out with. i am ...,i am new to california and looking for someone...,you want to be swept off your feet!<br />\nyou...
1,i am a chef: this is what that means.<br />\n1...,dedicating everyday to being an unbelievable b...,being silly. having ridiculous amonts of fun w...,,i am die hard christopher moore fan. i don't r...,delicious porkness in all of its glories.<br /...,,,i am very open and will share just about anyth...,
2,"i'm not ashamed of much, but writing public te...","i make nerdy software for musicians, artists, ...",improvising in different contexts. alternating...,my large jaw and large glasses are the physica...,okay this is where the cultural matrix gets so...,movement<br />\nconversation<br />\ncreation<b...,,viewing. listening. dancing. talking. drinking...,"when i was five years old, i was known as ""the...","you are bright, open, intense, silly, ironic, ..."
3,i work in a library and go to school. . .,reading things written by old dead people,playing synthesizers and organizing books acco...,socially awkward but i do my best,"bataille, celine, beckett. . .<br />\nlynch, j...",,cats and german philosophy,,,you feel so inclined.
4,hey how's it going? currently vague on the pro...,work work work work + play,creating imagery to look at:<br />\nhttp://bag...,i smile a lot and my inquisitive nature,"music: bands, rappers, musicians<br />\nat the...",,,,,
