![Humble Bumble Hero](./images/bumble.gif)

# Humble Bumble Data Analyst Interview Challenge 🐝🍯
## Bumble   Data Analyst Interview Challenge using Python, Pandas, and Matplotlib 🐝🍯


![Bumble Question 1](./images/bumble_Q1.gif)
# Question 1:


Please complete the below shell function so that,  given a string s, it will count the number of unique words, 
which is case insensitive and ignores punctuation. 

* The answer should be printed, and should be printed in alphabetical
order. 

* No libraries outside of the python standard 
libraries can be used (ie, no pandas, no sklearn, no nltk etc).


# Example:

### "I'm smart, I'm educated. It would have been a disservice to every woman to go away or hide." - Whitney Wolfe Founder of Bumble

------
```Input: "I'm smart I'm educated. It would have been a disservice to every woman to go away or hide." 
Ouput: 
[
 ('a', 1),
 ('away', 1),
 ('been', 1),
 ('disservice', 1),
 ('educated', 1),
 ('every', 1),
 ('go', 1),
 ('have', 1),
 ('hide', 1),
 ("i'm", 2),
 ('it', 1),
 ('or', 1),
 ('smart', 1),
 ('to', 2),
 ('woman', 1),
 ('would', 1)
]```
-----

In [19]:
punctuations=[',', '.', '!', '"', '?']

def word_count(s): 
    sentence = s.lower() 
    for punctuation in punctuations:
        words = sentence.replace(punctuation, '')
    word_list = words.split()
    word_dict = {word : word_list.count(word) for word in word_list} 
    return sorted(word_dict.items())

In [20]:
word_count("I'm smart I'm educated. It would have been a disservice to every woman to go away or hide.")

[('a', 1),
 ('away', 1),
 ('been', 1),
 ('disservice', 1),
 ('educated.', 1),
 ('every', 1),
 ('go', 1),
 ('have', 1),
 ('hide.', 1),
 ("i'm", 2),
 ('it', 1),
 ('or', 1),
 ('smart', 1),
 ('to', 2),
 ('woman', 1),
 ('would', 1)]

![Bumble Question 2](./images/bumble_Q2.gif)

# Question 2: 

Using the given pandas dataframe, please calculate the ratio of messages sent to messages received (messages_sent / messages_received) split by country and gender, and visualise this in a way that is easy to digest and
understand. 

Please use any libraries you wish.

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [3]:
#creating dataframe
messages = pd.DataFrame({'gender':
['M','F','M','M','F','M','F','M',
'F','F','M','F','F'],'country':['UK','UK','UK','UK','FR','FR','FR','UK','FR','UK','BR','BR','BR'],
'messages_sent':[10,12,1,4,5,92,23,14,None,18,12,6,9],
'messages_received':[54,12,32,12,53,11,0,0,54,None,13,4,14]})
print(messages)

   gender country  messages_sent  messages_received
0       M      UK           10.0               54.0
1       F      UK           12.0               12.0
2       M      UK            1.0               32.0
3       M      UK            4.0               12.0
4       F      FR            5.0               53.0
5       M      FR           92.0               11.0
6       F      FR           23.0                0.0
7       M      UK           14.0                0.0
8       F      FR            NaN               54.0
9       F      UK           18.0                NaN
10      M      BR           12.0               13.0
11      F      BR            6.0                4.0
12      F      BR            9.0               14.0


![Bumble DF](./images/bumble_df.gif)

# Cleaning the NaNs with Zeros

In [21]:
# filling NaNs with zeros
messages = messages.fillna(0)
messages

Unnamed: 0,gender,country,messages_sent,messages_received
0,M,UK,10.0,54.0
1,F,UK,12.0,12.0
2,M,UK,1.0,32.0
3,M,UK,4.0,12.0
4,F,FR,5.0,53.0
5,M,FR,92.0,11.0
6,F,FR,23.0,0.0
7,M,UK,14.0,0.0
8,F,FR,0.0,54.0
9,F,UK,18.0,0.0


In [29]:
# Creating grouped table for messaged received
total_messages_received_df = messages.groupby(['country', 'gender']).\
    messages_received.\
    sum().\
    to_frame().\
    reset_index().\
    rename(columns = {'': 'messages_received'})

total_messages_received_df

Unnamed: 0,country,gender,messages_received
0,BR,F,18.0
1,BR,M,13.0
2,FR,F,107.0
3,FR,M,11.0
4,UK,F,12.0
5,UK,M,98.0


In [30]:
# Creating grouped table for messaged sent
total_messages_sent_df = messages.groupby(['country', 'gender']).\
    messages_sent.\
    sum().\
    to_frame().\
    reset_index().\
    rename(columns = {'': 'messages_sent'})

total_messages_sent_df

Unnamed: 0,country,gender,messages_sent
0,BR,F,15.0
1,BR,M,12.0
2,FR,F,28.0
3,FR,M,92.0
4,UK,F,30.0
5,UK,M,29.0


# Analysis: 
* French males have the highest send/receive ratio with 92 messages sent and only 11 received back.

* French females and UK males seem very popular with ratios of 26% and 30% respectively
* Both have received a lot more messages than they have sent. 
