# Device-based Identity

## Welcome to your notebook!

This is a new notebook -- all your code and the variables and functions you defined in the last notebook don't exist here.

We'll use this notebook for a dataset using different featuers collected from different user's phones, collected and made [publicly available](http://extrasensory.ucsd.edu/) as part of a study at UCSD. 

## Loading the dataset

We'll use the same pandas library to load our new dataset

In [None]:
import pandas as pd

data = pd.read_csv("data/device_data.csv", index_col=0)
data.head()

This dataset has a lot of features, and this time, the column names are not anonymized!

Let's take a look at what we have.

In [None]:
list(data.columns)

## Building our model

Our "inputs" and "outputs" are a little different now, as we aren't making a direct prediction.

Instead, we're interested in using the data to understand user behavior, so we won't include information about which user an observation is associated with in our clustering.

In [None]:
X = data.drop('user', axis=1)
y = data['user']

from sklearn.manifold import TSNE
X_reduced_tsne = TSNE(n_components=2, random_state=22).fit_transform(X.values)

Let's take a look at our clustered observations.

In [None]:
import matplotlib.pyplot as plt

f, (ax1) = plt.subplots(1, 1, figsize=(16,16))
ax1.scatter(X_reduced_tsne[:,0], X_reduced_tsne[:,1], c='gray', cmap='viridis', linewidths=2)

# Evaluating our model

Clearly there are clusters, but how well do they help us understand identity?

Let's see the same mapping, but highlight each observation with the correct user from our dataset.

In [None]:
f, (ax1) = plt.subplots(1, 1, figsize=(16,16))
ax1.scatter(X_reduced_tsne[:,0], X_reduced_tsne[:,1], c=y, cmap='viridis', linewidths=2)

## Model iteration

Choose specific labels to dive deeper into your exploration of the dataset. What do you notice?

In [None]:
# Which column do you want to focus on? (Check out the labels in the columns list for options)
COLUMN_NAME = "label:PHONE_ON_TABLE"

# What value should be in that column? 0: label not applied, 1: label applied
VALUE = 1

# Give your chart a useful title
TITLE = "My Chart"

temp_data = data[data[COLUMN_NAME] == VALUE]

X = temp_data.drop('user', axis=1)
y = temp_data['user']

X_reduced_tsne = TSNE(n_components=2, random_state=22).fit_transform(X.values)

f, (ax1) = plt.subplots(1, 1, figsize=(16,16))
ax1.scatter(X_reduced_tsne[:,0], X_reduced_tsne[:,1], c=y, cmap='viridis', linewidths=2)
plt.title(TITLE)
plt.show()