# Aiola Data Analyst Home Assignment

We're excited to move forward with the next step in the process for the Data Analyst position!<br><br>
The purpose of this assignment is to evaluate:<br>
 - Your technical knowledge
 - How comfortable you are translating business logic into analysis/code
 - How you approach unfamiliar data
 - Your creativity in problem-solving and ideation when faced with new challenges and projects  
 
The Customer Delivery team works closely with product management, Aiola's clients, R&D and other key stake-holders in a data-rich environment. Below you will find a real Aiola use-case and questions relating to product development, business logic, and technical implementation which represent a Data Analyst's day-to-day at Aiola.

The CSV file you have been provided with has a week's worth of advertising data. You don't need any familiarity with advertising to approach the dataset and work with it. The Delivery team works directly with companies from a wide range of industries, so we aren't look for domain expertise, just to understand how you handle data.

Please leave relevant comments and test your code as necessary, even if you are unable to solve a question. It helps us understand your skills better.

Good luck!

# Media company use case 

Aiola’s delivery team is responsible for developing the business logic and analytic capabilities of Aiola’s artificially intelligent analyst for a new client - a large media group that operates a number of TV networks and channels. Clients like the media company are considered “low-tech”, and use traditional methods for their research, analysis, and forecasting. 

Our product brings on-demand analytical capabilities into their meetings, calls and chats through their client-of-choice (WhatsApp, Slack, Teams), offering insights on their data and generating reports  – wherever they work, Aiola is available. 

The delivery team is working on the initial skill package, the first set of capabilities that the virtual analyst will provide, automating some of the media company’s data analysis processes. 

The skills that are in development use internal data from the client, the dataset has information about advertisers and their brands, and details about the advertisements they purchased. The client has also provided the delivery with team with publicly available data on other networks. 

The delivery team is working on three skills that the media company’s innovation director determined would have a significant impact on their internal users:

 - Audience penetration and frequency analysis where penetration is the percentage of product category users among the show viewers and frequency is the number of times the show viewers purchase a product of this category per month, on average, and a joint score calculation. The skill calculates and visualizes the category or categories' performance among the top scored shows' viewers. Note: the skill displays information relevant to shows which advertise the category, for example, tobacco products are not relevant to childrens shows. 
 - Category investment analysis, where a weighted investment score is calculated, comparing the investment in a particular show to investments across all programs, and a visualization of the difference between the weighted score and actual investment in the show per category.   
 - A dynamic bi-weekly snapshot generated from the client's updated data, showing investments of brands and advertisers in specific shows and across networks.  

Below you will find several questions about this use case and the example skills.

##### Who are the end-users of the product?

##### How will they use the product?  If there is more than one user-group what are the differences in how they will use it?

##### Can you suggest other features to add and modifications for the in-development features? What is the impact of each addition or modification?

Please be detailed in your response. If you suggest a feature or advanced capability that will require other datasets or sources, make a note of this in your answer and what the impact of the additional data will have on the skills.

##### Can you explain how feature modification will impact the users? 

# Coding section

##### Import libraries and packages

In [2]:
import pandas as pd
import numpy as np
import matplotlib

##### Load in and observe data

In [4]:
df = pd.read_csv('anonymized_data.csv')

In [7]:
# df.head()

df['Date']pd.to_datetime(df['Date'])

0       2020-12-10
1       2020-12-10
2       2020-12-10
3       2020-12-10
4       2020-12-10
           ...    
43856   2020-10-18
43857   2020-10-18
43858   2020-10-18
43859   2020-10-18
43860   2020-10-18
Name: Date, Length: 43861, dtype: datetime64[ns]

##### How many advertisers are in the file?

In [None]:
print(df['Advertiser'].nunique())
# number of distinct values within the column

##### On average, how many brands are there per advertiser?

In [None]:
print(df.groupby(['Advertiser'])['Brand'].nunique())
# as follows

##### Which brand purchased the largest number of slots?

In [None]:
x = df.groupby(['Brand']).count().reset_index()
x1 = ((x[['Brand', 'Length']]).sort_values('Length', ascending=False))
x2 = x1.rename(columns={'Length': 'Counter'})
print(x2.head(1))
# Brand_131 is the one who ordered the largest number of slots (by slots, not time)

##### Show the top purchasers of slots per week day

In [None]:
df = pd.read_csv('/Users/tomergutkin/Downloads/anonymized_data (1).csv')
x = df.groupby(['Week Day'])['Brand'].value_counts()

x1 = x.index.get_level_values(0)
x2 = x.loc[x.groupby(x1).idxmax()]

x2

# Needed to say, it took me a while. but it truly believe it is a very important ex.

#### Please convert the 'Date' column into date type of values

##### Format the results from the previous query as following:
##### "{week _day}: {brand} purchased {number of spots}"

Example output: 

Thursday: Brand_131 purchased 675  
Sunday: Brand_0 purchased 583  
Monday: Brand_131 purchased 582  
Friday: Brand_131 purchased 551  
Saturday: Brand_0 purchased 527  

##### Find the market share of slots by brand on a specific day (for example: Monday)

In [None]:
df = pd.read_csv('/Users/tomergutkin/Downloads/anonymized_data.csv')
#x = df.groupby(['TimeBand', 'Week Day'])['TimeBand'].count() # gen number of brands per slot
#y = df.groupby(['Brand', 'TimeBand', 'Week Day'])['TimeBand'].count() # specific number of brands per slot
#x.to_frame()
#y.to_frame()

x = df[df['Week Day'] == 'Monday']['Brand'].value_counts() # Market share of slots per brand on Monday
x['Brand'] = x['Brand'] / x[‘your column of choice’].sum()*100

## to be continued


# notice - there are time frames which are not currect (such as 26, or 25 hour of the day)


##### Find the average spot time length (in "Length" column) for each spot type (column "Type") 

In [None]:
df.head(5)

##### Plot (in a bar chart) the top ten brands and the share of Saturday slots they purchased

##### For the top 5 brands which purchased the most spots over the week, visualize the number of spots they purchased per day

##### List the most popular time slot per Category

##### Show all data available for a certain brand or advertiser

You are given the following parameters that are passed from the user interface:

 - brand_or_advertiser_selection
 - brand_or_advertiser_value
 
The parameters indicate user selections. 
 
For instance, if the parameter "brand_or_advertiser_selection" is set to "advertiser", then the user selected to get informartion about an advertiser, the brand_or_advertiser_value holds the name of the brand or advertiser to show.

Use the given parameters to show information relevant to the end-user. You may set the parameter as you see fit. 

Hint: remember to account for edge cases in your code

##### Take a close look at the data, can you spot any problems with it? How would you correct it?

Hint: perform univariate analysis for each column in the table to discover issues with the values