# Data Science Technical Assessment

At Nexus Equities we are looking for motivated self starters that have the ability to perform technical analysis on a variety of datasets, and create informative vizualizations.

To test your ability to think for yourself as well as your technical skills we have prepared a short assessment that will give us an indication of your ability to undertake tasks and provide value to this firm.

### Assessment Outline

The assessment will be broken into two parts:
1) A few technical questions to assess your python and data wrangling skills
2) A mini project based on a research question

You may be asked to explain some of your answers during the in-person interview

# Part 1: Technical questions

## Python

### Palindrome

Given a string, write a python function to check if it is palindrome or not. A string is said to be palindrome if the reverse of the string is the same as the string. For example, “radar” is a palindrome, but “radix” is not a palindrome.

Example:

    string_input = "radar"
    result = is_palindrome(string_input)
    result == True
    
    string_input = "radix"
    result = is_palindrome(string_input)
    result == False

#### Function

In [17]:
def is_palindrome(string_input):
    pass

#### Test

In [18]:
true_input = "radar"
is_palindrome(true_input) 
# Should return True

In [19]:
false_input = "radix"
is_palindrome(false_input)
# Should return False

## Data Wrangling

This part will test your ability to work with pandas for data wrangling, this is an essential skill to manipulate data for analysis.

### Column Transformation

In [20]:
import pandas as pd

housing = pd.read_csv("data/housing.csv")

In [21]:
housing.corr()['median_house_value'].sort_values()

latitude             -0.144160
longitude            -0.045967
population           -0.024650
total_bedrooms        0.049686
households            0.065843
housing_median_age    0.105623
total_rooms           0.134153
median_income         0.688075
median_house_value    1.000000
Name: median_house_value, dtype: float64

Based on a simple correlation analysis we can expect the median income of our housing dataset to have the largest impact of the median_house_value, which is the variable we are trying to predict. From previous research we know that combining several features together could lead to higher correlations.

Using pandas please complete the following tasks:
    
    1) Create this column: 
        bedroom_to_room_ratio = total_bedrooms/total_rooms
        
    2) Create a new 'near_water' column containing 1 if the 'ocean_proximity' of a property is ('<1H OCEAN' or 'NEAR OCEAN' or 'NEAR BAY' or 'ISLAND') else 0 if 'ocean_proximity'<1H OCEAN = 'INLAND')

In [23]:
housing.ocean_proximity.value_counts()

<1H OCEAN     9136
INLAND        6551
NEAR OCEAN    2658
NEAR BAY      2290
ISLAND           5
Name: ocean_proximity, dtype: int64

#### The resulting dataframe should like this

In [29]:
housing_answer = pd.read_csv("data/desired_result_housing.csv", index_col=0)

In [36]:
housing_answer[['total_rooms', 'total_bedrooms', 'ocean_proximity', 'bedroom_to_room_ratio', 'near_water']]

Unnamed: 0,total_rooms,total_bedrooms,ocean_proximity,bedroom_to_room_ratio,near_water
0,880.0,129.0,NEAR BAY,0.146591,1
1,7099.0,1106.0,NEAR BAY,0.155797,1
2,1467.0,190.0,NEAR BAY,0.129516,1
3,1274.0,235.0,NEAR BAY,0.184458,1
4,1627.0,280.0,NEAR BAY,0.172096,1
...,...,...,...,...,...
20635,1665.0,374.0,INLAND,0.224625,0
20636,697.0,150.0,INLAND,0.215208,0
20637,2254.0,485.0,INLAND,0.215173,0
20638,1860.0,409.0,INLAND,0.219892,0


### Work

In [24]:
# Show your working here

# Part 2: Project

The csv file chicago_sales.csv represents a table of 6685 sales in the Chicago MSA (Metropolitan Statistical Area). The investment team are interested in finding out about the sales trends. Particularly over the last 10 years.

Questions you may wish to answer:
- Has the average price_per_acre_lot_area increased over the last 10 years? 5 years?
- What is the average amoount of sales per property?
- Did some years have higher deal counts (# of sales) than others, if so, is there a trend ? 

This is purposely an open ended question as we are primarily testing your ability to think for yourself and assess what insights you may deem useful for an investment analyst. 

Hint: Visual insights are an extremely thorough way to back up your points, and feel free to add your own analysis through text using the markdown state on a jupyter cell.

In [25]:
import pandas as pd
import matplotlib.pyplot as plt

In [26]:
sales = pd.read_csv("data/chicago_sales.csv", index_col=0)

In [27]:
sales.head()

Unnamed: 0,sales_id,sale_date,sale_amount,price_per_acre_lot_area,price_per_building_area,recording_date,property_id
0,75a77049-e275-5f1e-a3cb-397bfd706ad8,2016-10-10,0.0,,0.0,2016-10-13,5405f32d-0e37-5f48-910b-13ad98f2ebdf
1,ec383140-9e36-566b-8ea2-0a61fb887d27,2003-05-20,,,,2003-05-20,5405f32d-0e37-5f48-910b-13ad98f2ebdf
2,50afe2a0-3389-57e3-8cc7-d94c734bef71,2020-08-20,1750000.0,292161.875232,41.037426,2020-12-30,f5dced65-05eb-5c79-a305-cb9c0184cf5a
3,d56d9ec0-7491-5294-b215-74d9f86e7551,2008-03-13,1200000.0,200339.571588,28.139949,2008-03-27,f5dced65-05eb-5c79-a305-cb9c0184cf5a
4,c34b87da-d2be-5ad6-877a-df9c94c90a7d,1985-08-01,37000.0,14800.0,2.202381,1985-08-01,f5dced65-05eb-5c79-a305-cb9c0184cf5a


In [28]:
# Show your working here