## Use Case: Fitbit

### Imagine you are a Data Scientist at Fitbit

You've been given a user data to analyse and find some insights which can be shown on the smart watch.

#### But why would we want to analyse the user data for desiging the watch?

These insights from the user data can help business make customer oriented decision for the product design.



#### Lets first look at the data we have gathered

Link: https://drive.google.com/file/d/1Uxwd4H-tfM64giRS1VExMpQXKtBBtuP0/view?usp=sharing

<img src='https://drive.google.com/uc?id=1Uxwd4H-tfM64giRS1VExMpQXKtBBtuP0'>


#### Notice that there are some user features in the data

There are provided as various columns in the data.

#### Every row is called a record or data point


#### What are all the features provided to us? 

- Date
- Step Count
- Mood (Categorical)
- Calories Burned
- Hours of sleep
- Feeling Active (Categorical)


**Using NumPy, we will explore this data to look for some interesting insights - Exploratory Data Analysis.**

#### EDA is all about asking the right questions

#### What kind of questions can we answer using this data?

- How many records and features are there in the dataset?
- What is the **average step count**?
- On which day the **step count was highest/lowest?** 


#### Can we find some deeper insights?

We can probably see how daily activity affects sleep and moood.

We will try finding 
- How daily activity affects mood? 

In [1]:
import numpy as np

In [2]:
fitbit = np.loadtxt('fit.txt', dtype='str')

In [3]:
fitbit.shape

(96, 6)

In [4]:
fitbit.ndim

2

In [5]:
fitbit[:5]

array([['06-10-2017', '5464', 'Neutral', '181', '5', 'Inactive'],
       ['07-10-2017', '6041', 'Sad', '197', '8', 'Inactive'],
       ['08-10-2017', '25', 'Sad', '0', '5', 'Inactive'],
       ['09-10-2017', '5461', 'Sad', '174', '4', 'Inactive'],
       ['10-10-2017', '6915', 'Neutral', '223', '5', 'Active']],
      dtype='<U10')

There are 96 records and each record has 6 features.
These features are:
- Date
- Step count
- Mood
- Calories Burned
- Hours of sleep
- activity status

#### Notice that above array is a homogenous containing all the data as strings

In order to work with strings, categorical data and numerical data, we will have save every feature seperately

#### How will we extract features in seperate variables?

We can get some idea on how data is saved.

Lets see whats the first element of `data`

In [6]:
fitbit[0]

array(['06-10-2017', '5464', 'Neutral', '181', '5', 'Inactive'],
      dtype='<U10')

Hm, this extracts a row not a column

Think about it.

#### Whats the way to change columns to rows and rows to columns?

Transpose

In [8]:
date = fitbit.T[0]
date

array(['06-10-2017', '07-10-2017', '08-10-2017', '09-10-2017',
       '10-10-2017', '11-10-2017', '12-10-2017', '13-10-2017',
       '14-10-2017', '15-10-2017', '16-10-2017', '17-10-2017',
       '18-10-2017', '19-10-2017', '20-10-2017', '21-10-2017',
       '22-10-2017', '23-10-2017', '24-10-2017', '25-10-2017',
       '26-10-2017', '27-10-2017', '28-10-2017', '29-10-2017',
       '30-10-2017', '31-10-2017', '01-11-2017', '02-11-2017',
       '03-11-2017', '04-11-2017', '05-11-2017', '06-11-2017',
       '07-11-2017', '08-11-2017', '09-11-2017', '10-11-2017',
       '11-11-2017', '12-11-2017', '13-11-2017', '14-11-2017',
       '15-11-2017', '16-11-2017', '17-11-2017', '18-11-2017',
       '19-11-2017', '20-11-2017', '21-11-2017', '22-11-2017',
       '23-11-2017', '24-11-2017', '25-11-2017', '26-11-2017',
       '27-11-2017', '28-11-2017', '29-11-2017', '30-11-2017',
       '01-12-2017', '02-12-2017', '03-12-2017', '04-12-2017',
       '05-12-2017', '06-12-2017', '07-12-2017', '08-12

In [9]:
date, step_count, mood, calories_burned, hours_sleep, activity = fitbit.T

In [11]:
date[:5]

array(['06-10-2017', '07-10-2017', '08-10-2017', '09-10-2017',
       '10-10-2017'], dtype='<U10')

In [12]:
step_count[:5]

array(['5464', '6041', '25', '5461', '6915'], dtype='<U10')

In [13]:
mood[:5]

array(['Neutral', 'Sad', 'Sad', 'Sad', 'Neutral'], dtype='<U10')

In [14]:
# Lets check the datatype

step_count.dtype

dtype('<U10')

In [16]:
step_count = np.array(step_count, int)

step_count

array([5464, 6041,   25, 5461, 6915, 4545, 4340, 1230,   61, 1258, 3148,
       4687, 4732, 3519, 1580, 2822,  181, 3158, 4383, 3881, 4037,  202,
        292,  330, 2209, 4550, 4435, 4779, 1831, 2255,  539, 5464, 6041,
       4068, 4683, 4033, 6314,  614, 3149, 4005, 4880, 4136,  705,  570,
        269, 4275, 5999, 4421, 6930, 5195,  546,  493,  995, 1163, 6676,
       3608,  774, 1421, 4064, 2725, 5934, 1867, 3721, 2374, 2909, 1648,
        799, 7102, 3941, 7422,  437, 1231, 1696, 4921,  221, 6500, 3575,
       4061,  651,  753,  518, 5537, 4108, 5376, 3066,  177,   36,  299,
       1447, 2599,  702,  133,  153,  500, 2127, 2203])

In [17]:
mood.dtype

dtype('<U10')

In [19]:
np.unique(mood, return_counts=True)

(array(['Happy', 'Neutral', 'Sad'], dtype='<U10'),
 array([40, 27, 29], dtype=int64))

In [22]:
calories_burned.dtype

calories_burned = np.array(calories_burned, int)
calories_burned

array([181, 197,   0, 174, 223, 149, 140,  38,   1,  40, 101, 152, 150,
       113,  49,  86,   6,  99, 143, 125, 129,   6,   9,  10,  72, 150,
       141, 156,  57,  72,  17, 181, 197, 131, 154, 137, 193,  19, 101,
       139, 164, 137,  22,  17,   9, 145, 192, 146, 234, 167,  16,  17,
        32,  35, 220, 116,  23,  44, 131,  86, 194,  60, 121,  76,  93,
        53,  25, 227, 125, 243,  14,  39,  55, 158,   7, 213, 116, 129,
        21,  28,  16, 180, 138, 176,  99,   5,   1,  10,  47,  84,  23,
         4,   0,   0,   0,   0])

In [23]:
hours_sleep = np.array(hours_sleep, int)
hours_sleep

array([5, 8, 5, 4, 5, 6, 6, 7, 5, 6, 8, 5, 6, 7, 5, 6, 8, 5, 4, 5, 6, 8,
       5, 6, 5, 8, 5, 4, 5, 4, 5, 4, 3, 2, 9, 5, 6, 4, 5, 8, 4, 5, 6, 5,
       6, 5, 6, 5, 6, 5, 6, 7, 6, 7, 6, 5, 6, 7, 8, 8, 7, 8, 5, 4, 3, 3,
       4, 5, 5, 5, 3, 4, 4, 5, 5, 5, 5, 5, 5, 4, 3, 4, 5, 5, 4, 5, 3, 3,
       3, 2, 3, 2, 8, 5, 5, 5])

### Let's try to get some insights from the data.


#### What's the average step count? 

How can we calculate average? => `.mean()`

In [24]:
step_count.mean()

2935.9375

#### Average step count for user is `2935.9375`

#### On which day the step count was highest?

How will be find it? 

First we find the index of maximum step count and use that index to get the date.

How'll we find the index? =>  

Numpy provides a function `np.argmax()` which returns the index of maximum value element.

Similarly, we have a function `np.argmin()` which returns the index of minimum element.

In [25]:
date[step_count.argmax()]

'14-12-2017'

###### Let's check the calorie burnt on the day

In [27]:
calories_burned[step_count.argmax()]

243

#### Steps count on that particular day

In [28]:
step_count.max()

7422

#### Let's try to compare step counts on bad mood days and good mood days


In [33]:
mean_sad = np.mean(step_count[mood == 'Sad'])
mean_happy = np.mean(step_count[mood == 'Happy'])
std_sad = np.std(step_count[mood == 'Sad'])
std_happy = np.std(step_count[mood == 'Happy'])

mean_sad, mean_happy, std_sad, std_happy

(2103.0689655172414, 3392.725, 2021.2355035376254, 2088.4016254961593)

#### Let's try to check inverse. Mood when step count was greater/lesser

In [35]:
# When step_count > 4000

np.unique(mood[step_count > 4000], return_counts = True)

(array(['Happy', 'Neutral', 'Sad'], dtype='<U10'),
 array([22,  9,  7], dtype=int64))

#### On `4000` or more step counts, Mostly mood is happy

In [36]:
# when step_count < 2000

np.unique(mood[step_count < 2000], return_counts = True)

(array(['Happy', 'Neutral', 'Sad'], dtype='<U10'),
 array([13,  8, 18], dtype=int64))

#### As we can see on fewer step count `<2000` mood is sad on most number of days

# Conclusion

## There may be a correlation between Mood and step count**