# Clever Data Analyst Test
## By Conor Powell

### First Look
##### Data appears to be clean after checking for nulls, no need to fill data. Few changes made in excel, mainly the removal of formatting for ease of ingestion

### Best Markets


#### This slowly became an optimization question and as such here are the variables I created and how we used them to find the best markets.


#### Price Increase Rate
##### The first thing I thought of researching was the ratio of price increases to price decreases. Hypothetically a booming market would be seeing more price increases then decreases. This of course can be skewed as we don't have access to if they closed above or below listing price nor any of the data that impacts property price. A 1.0 would represent that there are more price increases then decreases.


| Market | Price Increase Rate |
| --- | --- |
| Portland, OR | 0.175316 |
| Houston, TX | 0.122764 |
| San Jose, CA | 0.117438 |
| San Antonio, TX | 0.091678 |
| Fresno, CA | 0.083582 |


#### Closing Rate
##### A booming market should be closing properties, as such measuring the amount of properties on market vs. how many were closed would give us an indicator of how much of the listing are closed yearly. A 1.0 would mean everything was closed.


| Market | Closing Rate |
| --- | --- |
| Grand Rapids, MI | 0.704462 |
| Buffalo, NY | 0.695563 |
| Providence, RI | 0.650317 |
| San Jose, CA | 0.637220 |
| Cincinnati, OH | 0.587072 |


#### Price Change per Day on Market (PCD)
##### A booming market should be seeing an increase in prices. This metric is two fold.
##### 1. Houses in booming markets are in demand driving up prices and quick closure times. This means that we could turn assets around these areas quickly.
##### 2. Less risk if we have to hold onto these assets for longer
#### This was calculated by subtracting the median closing price from the median opening price then dividing by the median days on market.


| Market | PCD |
| --- | --- |
| San Jose, CA | $3478.94 |
| Buffalo, NY | $3341.66 |
| San Francisco, CA | $1547.62 |
| Detroit, MI | $601.14 |
| Milwaukee, WI | $597.62 |


#### Combined all of the above
##### Adding all of the above together and finding out which one has the highest overall rank.


| Rank | Market | Price Increase Rank | Price Increase Rate | Closing Rate Rank | Closing Rate | PCD Rank | PCD | Total Combined Ranks
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| 1 | San Jose, CA | 3 | 0.117438 | 4 | 0.637220 | 1 | $3478.94 | 8 |
| 2 | San Francisco, CA | 12 | 0.066316 | 3 | 0.489864 | 3 | $1547.62 | 27 |
| 3 | Buffalo, NY | 24 | 0.048980 | 2 | 0.695593 | 2 | $3341.66 | 28 |
| 4 | Minneapolis, MN | 14 | 0.062971 | 15 | 0.482529 | 15 | -$85.40 | 42 |
| 5 | Detroit, MI | 32 | 0.035230 | 9 | 0.545000 | 4 | $601.14 | 45 |


### The Bay Area sounds like a good bet, with San Jose and San Francisco ranking high overall and Fresno coming in 5th for total price increases.
### For trends we could be looking at the great lakes as an emerging area, especially after the few decades in Detroit. With Minneapolis seeing a lower loss per day on the market and at rank 15 we could be seeing more midwest booms happening. More so when we consider Buffalo also being on the great lakes and with Milwaukee and Grand Rapids being nearby. This might support further looking into Cleveland which is smack dab in between the two.


#### Including Population and Listing Closed Price
##### In some cases we may want to also look at total population available and the actual closing price to better optimize our choices. As such here is what the above would have looked like if we included a population and listing close price rank into the calculation.
| Rank | Market | Total Combined Ranks |
| --- | --- | --- |
| 1 |  San Francisco, CA | 41 |
| 2 | San Jose, CA | 45 |
| 3 | San Diego, CA | 67 |
| 4 | Washington DC | 79 |
| 5 | Minneapolis, MN | 84 |


#### The full data file is in GitHub under full_reports/top_markets.csv

### Best Agents
##### Cleaning Notes: Asha Smith was missing number of customer reviews and average rating, so hers will be replaced with the median of each


#### Much like the problem above it was about finding the best agent amongst a bunch of different variables. I wanted to get the median calls per close for each agent as well as the median close price for them and because this needed prior work this step took place first. As such I joined both calls and sales into the customers table then aggregated them by their Agent ID to join up to the agents table. These should help us identify agents that can close deals quickly as well agents that are aiming for higher dollar amounts of properties.


#### The first variable I wanted to make was creating a weighted average for the agents as I noticed some low review agents with both high and low scores and that would create outliers when creating a ranking. As such I used a simple regularization method used by IMBD for their reviews to create this weight average to get a better idea of the actual value of the reviews themselves. I also included total closes to offset some variance that might happen from the above.


#### The final variable was the total closure rate which was just the sum of closed properties over the sum of given properties. This should give us insight into how much of the given assets have been closed.


#### After all these variables were calculated and joined together. I then ranked them all, added all the ranks together, then ranked each agent according to their total score partitioned by their market. Then filtered for only the rank 1 agent and thus was given a list of the best agents per market. I have included the top 5 below by total rank and have included the full CSV file in GitHub at full_reports/top_agents.csv. This includes all the details on rankings and breakdowns of the stats.


| Company Rank | Agent | Market | Sum of Rankings |
| --- | --- | --- | --- |
| 1 | Raj Hill | Raleigh, NC | 127 |
| 2 | Hassan Carter | Louisville, KY | 143 |
| 3 | Asha Khan | Grand Rapids, MI | 205 |
| 4 | Carlos Young | Riverside, CA | 224 |
| 5 | Luis Hall | Richmond, VA | 275 |


#### I believe with this method it allows the given variables for the agents to highlight their multiple skills when it comes to their profession. It combines a variety of data points and ranking to find the most consistent and highest rank agent in each loaction.

### One Area for Growth


#### Most of the top markets are either cities near a large body of water and/or another big city. We can factor a lot of this out simply because humans always settle near navigable water ways for trade and travel. But there is a large group of Great Lakes markets and agents that seems to be making a stride. Given this I think a sales or marketing strategy in the commuting suburbs near or between two good water-adjacent markets would be ideal. Since these markets sit between (e.g. Clevand and its suburbs around Buffalo #3 and Detroit #5) would most likely allow us to represent markets that could see major growth which we could profit from as well as having low risk. If these markets keep growing these assets will also most likely increase as well.

### What if you had more time and data sources?


#### First things first I would do a cluster analysis on the best markets to see if there are any emerging markets that are close, statistically, to the markets I identified. We could also slice some of the emerging markets data to see if it matches certain patterns or trends within a market.


#### More data on the houses being sold so I can be more specific on if it under-performed given the market. With no SQFT, Rooms, or any other metric that impacts housing prices it is hard to get specific on an agent's performance during a sale. Also more specific data on location since a street can make a difference on the price of the house and its performance on the market.


#### More data needed for calls, a timestamp showing when the call ended and what process they are in the buying/selling process or your sales pipeline would be useful. This can also be hard because we are dealing with individual customers and how they interact with us which can change given the market, their mood, the news, and everything else in between.


#### More time to go over the agents, I would have liked to get into more specifics about if they are better at closing certain properties and if we can optimize which properties get handed off to each agent.

#### Marketing data would be nice to address the growth question in more depth about market or sales strategy.

### Describe Your Process

#### The first thing I saw when you asked me to find the "Best" market or agent was that its really impossible to describe the best since you can be the best at any number of things as an agent. Thus boiling it down to simply one thing would have been naive and not represetative of the agent or market as a whole. Therefore a multitude of different variables would have to be combined and then ranked in order for me to find the "best" across multiple categories which would be more indicative of the market/agent. After getting the variables and ranks, it was then as easy as adding them together and finding the markets and agents with the lowest ranks overall. This method also allows us to add further variables later on as more data is added and can be specific in certain variables if wanting to drill down more.