# Reddit r/ForeverAlone Survey Analysis

ForeverAlone was a subreddit to share the forever alone meme, but somewhere down the line, it turned into an identity and a place where people who have been alone most of their lives could come and talk about their issues.

Tag line of r/ForeverAlone [A subreddit for Forever Alone. lonely depressed sad anxiety](https://www.reddit.com/r/ForeverAlone/)

## Import Data

In [1]:
# import packages

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly as plty
import plotly.plotly as py
import plotly.graph_objs as go
import plotly.figure_factory as ff
import plotly.tools as tls
import warnings
warnings.filterwarnings('ignore')
warnings.simplefilter('ignore')
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)

%matplotlib inline

In [3]:
# load dataset
df = pd.read_csv('foreveralone.csv')

In [4]:
df.head()

Unnamed: 0,time,gender,sexuallity,age,income,race,bodyweight,virgin,prostitution_legal,pay_for_sex,friends,social_fear,depressed,what_help_from_others,attempt_suicide,employment,job_title,edu_level,improve_yourself_how
0,5/17/2016 20:04:18,Male,Straight,35,"$30,000 to $39,999",White non-Hispanic,Normal weight,Yes,No,No,0.0,Yes,Yes,"wingman/wingwoman, Set me up with a date",Yes,Employed for wages,mechanical drafter,Associate degree,
1,5/17/2016 20:04:30,Male,Bisexual,21,"$1 to $10,000",White non-Hispanic,Underweight,Yes,No,No,0.0,Yes,Yes,"wingman/wingwoman, Set me up with a date, date...",No,Out of work and looking for work,-,"Some college, no degree",join clubs/socual clubs/meet ups
2,5/17/2016 20:04:58,Male,Straight,22,$0,White non-Hispanic,Overweight,Yes,No,No,10.0,Yes,Yes,I don't want help,No,Out of work but not currently looking for work,unemployed,"Some college, no degree",Other exercise
3,5/17/2016 20:08:01,Male,Straight,19,"$1 to $10,000",White non-Hispanic,Overweight,Yes,Yes,No,8.0,Yes,Yes,date coaching,No,A student,student,"Some college, no degree",Joined a gym/go to the gym
4,5/17/2016 20:08:04,Male,Straight,23,"$30,000 to $39,999",White non-Hispanic,Overweight,No,No,Yes and I have,10.0,No,Yes,I don't want help,No,Employed for wages,Factory worker,"High school graduate, diploma or the equivalen...",


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 469 entries, 0 to 468
Data columns (total 19 columns):
time                     469 non-null object
gender                   469 non-null object
sexuallity               469 non-null object
age                      469 non-null int64
income                   469 non-null object
race                     469 non-null object
bodyweight               469 non-null object
virgin                   469 non-null object
prostitution_legal       469 non-null object
pay_for_sex              469 non-null object
friends                  469 non-null float64
social_fear              469 non-null object
depressed                469 non-null object
what_help_from_others    469 non-null object
attempt_suicide          469 non-null object
employment               469 non-null object
job_title                457 non-null object
edu_level                469 non-null object
improve_yourself_how     469 non-null object
dtypes: float64(1), int64(1), object(17)

In [6]:
df.describe()

Unnamed: 0,age,friends
count,469.0,469.0
mean,23.963753,7.956716
std,6.023526,34.3715
min,12.0,0.0
25%,20.0,1.0
50%,23.0,3.0
75%,26.0,7.0
max,70.0,600.0


In [7]:
df.duplicated().any()

False

In [8]:
df.isna().any()

time                     False
gender                   False
sexuallity               False
age                      False
income                   False
race                     False
bodyweight               False
virgin                   False
prostitution_legal       False
pay_for_sex              False
friends                  False
social_fear              False
depressed                False
what_help_from_others    False
attempt_suicide          False
employment               False
job_title                 True
edu_level                False
improve_yourself_how     False
dtype: bool

In [9]:
df.job_title.value_counts()

Student                                                         50
student                                                         25
none                                                            17
None                                                            12
Unemployed                                                      11
Engineer                                                         9
Student                                                          7
.                                                                6
-                                                                6
Software Engineer                                                5
Intern                                                           5
Cashier                                                          4
Nothing                                                          4
na                                                               4
Clerk                                                         

## Data Wrangling

<ul>
    <li> Change Datatype of friends from float to int, because it's not possible to have 5.5 friends.</li>
    <li> Remove rows where job_title is null. </li>
    <li> Rename job_title values where the values is either in lower case or has a space before or after.  </li>
</ul>

Convert Datatype of Friends from float to int

In [10]:
# change dataype to int
df['friends'] = df['friends'].astype(np.int64)

In [11]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 469 entries, 0 to 468
Data columns (total 19 columns):
time                     469 non-null object
gender                   469 non-null object
sexuallity               469 non-null object
age                      469 non-null int64
income                   469 non-null object
race                     469 non-null object
bodyweight               469 non-null object
virgin                   469 non-null object
prostitution_legal       469 non-null object
pay_for_sex              469 non-null object
friends                  469 non-null int64
social_fear              469 non-null object
depressed                469 non-null object
what_help_from_others    469 non-null object
attempt_suicide          469 non-null object
employment               469 non-null object
job_title                457 non-null object
edu_level                469 non-null object
improve_yourself_how     469 non-null object
dtypes: int64(2), object(17)
memory usage:

Drop rows where it's null.

In [12]:
#drop rows with null values
df.dropna(inplace=True)

In [13]:
df.isna().any()

time                     False
gender                   False
sexuallity               False
age                      False
income                   False
race                     False
bodyweight               False
virgin                   False
prostitution_legal       False
pay_for_sex              False
friends                  False
social_fear              False
depressed                False
what_help_from_others    False
attempt_suicide          False
employment               False
job_title                False
edu_level                False
improve_yourself_how     False
dtype: bool

Rename Values of Job_titles

In [14]:
# strip stings with white space
df['job_title'] = df.job_title.str.strip()

In [15]:
# Function to replace job_title values
def replace_text(what, to):
    df.replace(what, to, inplace= True)

In [16]:
replace_text('student', 'Student')
replace_text('none', 'None')
replace_text("N/a", 'None')
replace_text('na', 'None')
replace_text('-', 'None')
replace_text('.', 'None')
replace_text('*', 'None')
replace_text('ggg', 'None')

In [17]:
df.job_title.value_counts()

Student                                                         82
None                                                            51
Unemployed                                                      13
Engineer                                                         9
Software Engineer                                                6
Intern                                                           5
Software developer                                               4
Cashier                                                          4
Nothing                                                          4
NEET                                                             3
Technician                                                       3
Clerk                                                            3
engineer                                                         3
Teacher                                                          3
Software Developer                                            

## EDA

In [18]:
df.gender.value_counts()

Male                  382
Female                 72
Transgender male        2
Transgender female      1
Name: gender, dtype: int64

In [19]:
# Gender counts

data = [go.Bar(x = ['Male', 'Female', 'Transgender Male', 'Transgender Female'],
              y = df.gender.value_counts())]
layout = go.Layout(
    title='Gender Frequency',
    xaxis=dict(
        title='Gender'
    ),
    yaxis=dict(
        title='Count'
        )
    )

fig = go.Figure(data=data, layout=layout)
plty.offline.iplot(fig)

In [20]:
# sexuality freqency

df.sexuallity.value_counts()

Straight       404
Bisexual        45
Gay/Lesbian      8
Name: sexuallity, dtype: int64

In [21]:
# Sexuality counts

data = [go.Bar(x = ['Straight', 'Bisexual', 'Gay/Lesbian'],
              y = df.sexuallity.value_counts())]
layout = go.Layout(
    title='Sexuality Frequency',
    xaxis=dict(
        title='Sexuality'
    ),
    yaxis=dict(
        title='Count'
        )
    )

fig = go.Figure(data=data, layout=layout)
plty.offline.iplot(fig)

In [22]:
# body weight

df.bodyweight.value_counts()

Normal weight    265
Overweight       110
Underweight       57
Obese             25
Name: bodyweight, dtype: int64

In [23]:
def univariate_bar(column, ttitle, xlabel, ylabel):
    temp = pd.DataFrame({column:df[column].value_counts()})
    df1 = temp[temp.index != 'Unspecified']
    df1 = df1.sort_values(by=column, ascending=False)
    data  = go.Data([
                go.Bar(
                  x = df1.index,
                  y = df1[column],
            )])
    layout = go.Layout(
            title = ttitle,
        xaxis=dict(
            title=xlabel
        ),
        yaxis=dict(
            title=ylabel
            )
    )
    fig  = go.Figure(data=data, layout=layout)
    return plty.offline.iplot(fig)

In [24]:
univariate_bar('bodyweight', 'Bodyweight Frequency', 'Weight', 'Counts')

In [25]:
univariate_bar('depressed', 'Number of People Depressed', ' ', 'Count')

In [26]:
univariate_bar('social_fear', 'Number of People having Social Fear', ' ', 'Count')

In [27]:
univariate_bar('attempt_suicide', 'Number of people attempted suicide', ' ', 'Count')

In [28]:
age = df['age']

trace = go.Histogram(x = age)

data = [trace]

layout = go.Layout(
    title = 'Age Distribution',
    xaxis = dict(
        title = 'Age'
    ),
    yaxis = dict(
        title ='Count'
    ))

fig = go.Figure(data, layout)
plty.offline.iplot(fig)

In [29]:
# Distribution of Friends

friends = df['friends']

trace = go.Histogram(x = friends)
data = [trace]

layout = go.Layout(
    title = 'Friends Distribution',
    xaxis = dict(
    title = 'Friend Count'),
    yaxis = dict(
    title = 'Count')
    )

fig = go.Figure(data, layout)
plty.offline.iplot(fig)

In [30]:
male = df[df['gender'] == 'Male' ]
female = df[df['gender'] == 'Female' ]

male_age = male['age']
female_age = female['age']
trace1 = go.Histogram(x = male_age, 
                      name = 'Male',
                     opacity = 0.5)
trace2 = go.Histogram(x = female_age,
                      name = 'Female',
                     opacity = 0.5)

data = [trace1, trace2]

layout = go.Layout(
    title = 'Age Distribution on Gender',
    barmode='overlay',
    xaxis = dict(
    title = 'Age'),
    yaxis = dict(
    title = 'Count')
    )

fig = go.Figure(data, layout)
plty.offline.iplot(fig)

In [31]:
male_friends = male['friends']
female_friends = female['friends']
trace1 = go.Histogram(x = male_friends, 
                      name = 'Male',
                     opacity = 0.5)
trace2 = go.Histogram(x = female_friends,
                      name = 'Female',
                     opacity = 0.5)

data = [trace1, trace2]

layout = go.Layout(
    title = 'Friends Distribution on Gender',
    barmode='overlay',
    xaxis = dict(
    title = 'Friends'),
    yaxis = dict(
    title = 'Count')
    )

fig = go.Figure(data, layout)
plty.offline.iplot(fig)

### Conclusion
*  Most of the people are in thier mid-age i.e between 18-30.
*  Most of them have friends between 0-9.
*  Almost more than half of them haven't attempted suicide.
*  Most of them are either depressed or have social fear.
