# Y.Music

# Contents <a id='back'></a>

* [Introduction](#intro)
* [Stage 1. Data Overview](#data_review)
    * [Initial Conclusions](#data_review_conclusions)
* [Stage 2. Data Preprocessing](#data_preprocessing)
    * [2.1 Header Style](#header_style)
    * [2.2 Missing Values](#missing_values)
    * [2.3 Duplicate Data](#duplicates)
    * [2.4 Data Preprocessing Conclusions](#data_preprocessing_conclusions)
* [Stage 3. Hypotheses Testing](#hypotheses)
    * [3.1 Hypothesis 1: Users activity in two cities.](#activity)
    * [3.2 Hypothesis 2: Music Preferences on Monday and Friday](#week)
    * [3.3 Hypothesis 3: Genre Preferences in Springfield and Shelbyville](#genre)
* [Insights](#end)

## Introductions <a id='intro'></a>

Every time we conduct research, we need to formulate a hypothesis that we could test. Sometimes we accept this hypothesis, but sometimes we also reject it. To generate informed decisions, a business must be able to understand whether its assumptions are true or not.

In this project, I will compare music preferences in the cities of Springfield and Shelbyville. You will learn from the actual Y.Music data to test the following hypothesis and compare user behavior in both cities.

### Goals: 
Testing three hypothesis:
1. Users activity varies depending on the day and the city.
2. On Monday morning, residents of Springfield and Shelbyville listen to different genres. This also valid to Friday night.
3. Users in Springfield and Shelbyville have different preferences. In Springfield, they prefer pop music, while in Shelbyville, rap music has more fans.

### Steps 

The data related to users behavior is stored in the file `/datasets/music_project_en.csv`. There is no information available regarding the quality of the data, therefore it is necessary to first examine it before testing any hypotheses.

Firstly, I will evaluate the data quality and determine if there any issues are significant. Then, during the data pre-processing stage, I will attempt to address the most serious problems.

This projects will consist of 3 steps: 
 1. Data Overview
 2. Preprocessing Data
 3. Testing the Hypothesis
 
[Back to Table of Contents](#back)

## Stage 1. Data Overvier <a id='data_review'></a>

In [None]:
# Import Pandas
import pandas as pd

In [None]:
df = pd.read_csv('/datasets/music_project_en.csv')
df.describe()

Unnamed: 0,userID,Track,artist,genre,City,time,Day
count,65079,63736,57512,63881,65079,65079,65079
unique,41748,39666,37806,268,2,20392,3
top,A8AE9169,Brand,Kartvelli,pop,Springfield,08:14:07,Friday
freq,76,136,136,8850,45360,14,23149


In [None]:
df.head(10)

Unnamed: 0,userID,Track,artist,genre,City,time,Day
0,FFB692EC,Kamigata To Boots,The Mass Missile,rock,Shelbyville,20:28:33,Wednesday
1,55204538,Delayed Because of Accident,Andreas Rönnberg,rock,Springfield,14:07:09,Friday
2,20EC38,Funiculì funiculà,Mario Lanza,pop,Shelbyville,20:58:07,Wednesday
3,A3DD03C9,Dragons in the Sunset,Fire + Ice,folk,Shelbyville,08:37:09,Monday
4,E2DC1FAE,Soul People,Space Echo,dance,Springfield,08:34:34,Monday
5,842029A1,Chains,Obladaet,rusrap,Shelbyville,13:09:41,Friday
6,4CB90AA5,True,Roman Messer,dance,Springfield,13:00:07,Wednesday
7,F03E1C1F,Feeling This Way,Polina Griffith,dance,Springfield,20:47:49,Wednesday
8,8FA1D3BE,L’estate,Julia Dalia,ruspop,Springfield,09:17:40,Friday
9,E772D5C0,Pessimist,,dance,Shelbyville,21:20:49,Wednesday


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 65079 entries, 0 to 65078
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0     userID  65079 non-null  object
 1   Track     63736 non-null  object
 2   artist    57512 non-null  object
 3   genre     63881 non-null  object
 4     City    65079 non-null  object
 5   time      65079 non-null  object
 6   Day       65079 non-null  object
dtypes: object(7)
memory usage: 3.5+ MB


This table contains seven columns, all of which hold the same data type, namely: object.

Recording to documentation:
- `'userID'` — User identifier
- `'Track'` — Track Title
- `'artist'` — Artist Name
- `'genre'`
- `'City'` — city where the user is located
- `'time'` — duration of the played track
- `'Day'` — name of the day

We can identify three issues with the column names:
1. Some column names are written in uppercase while others are written in lowercase.
2. Some column names contain spaces.
3. The terms "city", "time", and "day" are ambiguous as it is unclear whether they refer to the song or the user.

The number of values in each column is different, indicating that the data contains missing values.

### Data Preprocessing Conclusions <a id='data_review_conclusions'></a> 

Each row in the table contains data on the played track. Some columns describe the song itself: track title, artist, and genre. The rest convey information about the user: their city of origin, and the time they played the song.

It is evident that the data is sufficient for testing hypotheses, although there are missing values.

Next, we need to conduct data pre-processing before proceeding.

[Back to Table of Contents](#back)

## Stage 2. Data Preprocessing<a id='data_preprocessing'></a>

### Header Style <a id='header_style'></a>

In [None]:
df.columns

Index(['  userID', 'Track', 'artist', 'genre', '  City  ', 'time', 'Day'], dtype='object')

Modify the columns name according to  good writing style:
* Use snake_case for column names with multiple words.
* All characters should be in lowercase.
* Remove spaces.

In [None]:
df = df.rename(
    columns={
    '  userID' : 'user_id',
    'Track' : 'track',
    'artist' : 'artist',
    'genre' : 'genre',
    '  City  ' : 'user_city',
    'time' :'time_play',
    'Day' : 'day_play',
    }
)

In [None]:
df = df.rename(
    columns={
    '  userID' : 'user_id',
    'Track' : 'track',
    '  City  ' : 'user_city',
    'time' :'time_play',
    'Day' : 'day_play',
    }
)

In [None]:
df.columns# mengecek hasil: daftar nama kolom

Index(['user_id', 'track', 'artist', 'genre', 'user_city', 'time_play',
       'day_play'],
      dtype='object')

[Back to Table of Contents](#back)

### Missing Values <a id='missing_values'></a>

In [None]:
df.isna().sum()

user_id         0
track        1343
artist       7567
genre        1198
user_city       0
time_play       0
day_play        0
dtype: int64

Not all of the missing values will affect to the study. Such like, missing values in the `track` and `artist` are not critical. This could be replace with a clearer marker.
However missing values in `genre` will affect the comparison of music preferences in Springfield and Shelbyville. In real life, it's useful to investigage on why there're missing values and try to fix it. However, we dont have that chance to do so in this project. Therefore, I should:
* Fill the missing values with a marker.
* Evaluate on how far the missing values can impact my calculations.

In [None]:
columns_to_replace = ['track', 'artist', 'genre']
df[columns_to_replace] = df[columns_to_replace].fillna('unknown')

<div class="alert alert-success">
<b>Reviewer's comment v1</b> <a class="tocSkip"></a>

Telah mendefinisikan list `columns_to_replace` tanpa error sehingga dapat melakukan imputasi nilai-nilai yang hilang dengan lancar.

</div>

In [None]:
df.isna().sum()

user_id      0
track        0
artist       0
genre        0
user_city    0
time_play    0
day_play     0
dtype: int64

[Back to Table of Content](#back)

### Duplicated Data <a id='duplicates'></a>

In [None]:
df.duplicated().sum()

3826

In [None]:
df = df.drop_duplicates().reset_index(drop=True)

In [None]:
df.duplicated().sum()

0

In addition, here I should drop implicit duplicate in `genre` column. For instance, genre name may be written in different way.

Preview a list of unique genre name, sorted alphabetically.
* Take the specified Dataframe column
* Apply a sorting method to do so
* For the sorted column, call a method that will produce all unique values in the column

In [None]:
df['genre'].unique()

array(['rock', 'pop', 'folk', 'dance', 'rusrap', 'ruspop', 'world',
       'electronic', 'unknown', 'alternative', 'children', 'rnb', 'hip',
       'jazz', 'postrock', 'latin', 'classical', 'metal', 'reggae',
       'triphop', 'blues', 'instrumental', 'rusrock', 'dnb', 'türk',
       'post', 'country', 'psychedelic', 'conjazz', 'indie',
       'posthardcore', 'local', 'avantgarde', 'punk', 'videogame',
       'techno', 'house', 'christmas', 'melodic', 'caucasian',
       'reggaeton', 'soundtrack', 'singer', 'ska', 'salsa', 'ambient',
       'film', 'western', 'rap', 'beats', "hard'n'heavy", 'progmetal',
       'minimal', 'tropical', 'contemporary', 'new', 'soul', 'holiday',
       'german', 'jpop', 'spiritual', 'urban', 'gospel', 'nujazz',
       'folkmetal', 'trance', 'miscellaneous', 'anime', 'hardcore',
       'progressive', 'korean', 'numetal', 'vocal', 'estrada', 'tango',
       'loungeelectronic', 'classicmetal', 'dubstep', 'club', 'deep',
       'southern', 'black', 'folkrock', 

In [None]:
df['genre'].sort_values().unique()

array(['acid', 'acoustic', 'action', 'adult', 'africa', 'afrikaans',
       'alternative', 'ambient', 'americana', 'animated', 'anime',
       'arabesk', 'arabic', 'arena', 'argentinetango', 'art', 'audiobook',
       'avantgarde', 'axé', 'baile', 'balkan', 'beats', 'bigroom',
       'black', 'bluegrass', 'blues', 'bollywood', 'bossa', 'brazilian',
       'breakbeat', 'breaks', 'broadway', 'cantautori', 'cantopop',
       'canzone', 'caribbean', 'caucasian', 'celtic', 'chamber',
       'children', 'chill', 'chinese', 'choral', 'christian', 'christmas',
       'classical', 'classicmetal', 'club', 'colombian', 'comedy',
       'conjazz', 'contemporary', 'country', 'cuban', 'dance',
       'dancehall', 'dancepop', 'dark', 'death', 'deep', 'deutschrock',
       'deutschspr', 'dirty', 'disco', 'dnb', 'documentary', 'downbeat',
       'downtempo', 'drum', 'dub', 'dubstep', 'eastern', 'easy',
       'electronic', 'electropop', 'emo', 'entehno', 'epicmetal',
       'estrada', 'ethnic', 'eurofo

Search through a list to find implicit duplicates of the hiphop genre. This could be misspelled names or alternative names for the same genre.

You will find the following implicit duplicates:

* `hip`
* `hop`
* `hip-hop`

To remove them, use the replace_wrong_genres() function with two parameters:

* `wrong_genres=` — a list of duplicates
* `correct_genre=` — a string with the correct value

The function should correct the names in the 'genre' column of the df table, replacing each value from the wrong_genres list with the value in correct_genre.

In [None]:
def replace_wrong_genres(wrong_genres, correct_genre):
    for wrong_genre in wrong_genres:
        df['genre'] = df['genre'].replace(wrong_genres, correct_genre)
        
wrong = ['hip','hop','hip-hop']
correct = 'hiphop'

In [None]:
replace_wrong_genres(wrong, correct)

In [None]:
df['genre'].unique()# memeriksa duplikat implisit

array(['rock', 'pop', 'folk', 'dance', 'rusrap', 'ruspop', 'world',
       'electronic', 'unknown', 'alternative', 'children', 'rnb',
       'hiphop', 'jazz', 'postrock', 'latin', 'classical', 'metal',
       'reggae', 'triphop', 'blues', 'instrumental', 'rusrock', 'dnb',
       'türk', 'post', 'country', 'psychedelic', 'conjazz', 'indie',
       'posthardcore', 'local', 'avantgarde', 'punk', 'videogame',
       'techno', 'house', 'christmas', 'melodic', 'caucasian',
       'reggaeton', 'soundtrack', 'singer', 'ska', 'salsa', 'ambient',
       'film', 'western', 'rap', 'beats', "hard'n'heavy", 'progmetal',
       'minimal', 'tropical', 'contemporary', 'new', 'soul', 'holiday',
       'german', 'jpop', 'spiritual', 'urban', 'gospel', 'nujazz',
       'folkmetal', 'trance', 'miscellaneous', 'anime', 'hardcore',
       'progressive', 'korean', 'numetal', 'vocal', 'estrada', 'tango',
       'loungeelectronic', 'classicmetal', 'dubstep', 'club', 'deep',
       'southern', 'black', 'folkrock

In [None]:
df['genre'].sort_values().unique()

array(['acid', 'acoustic', 'action', 'adult', 'africa', 'afrikaans',
       'alternative', 'ambient', 'americana', 'animated', 'anime',
       'arabesk', 'arabic', 'arena', 'argentinetango', 'art', 'audiobook',
       'avantgarde', 'axé', 'baile', 'balkan', 'beats', 'bigroom',
       'black', 'bluegrass', 'blues', 'bollywood', 'bossa', 'brazilian',
       'breakbeat', 'breaks', 'broadway', 'cantautori', 'cantopop',
       'canzone', 'caribbean', 'caucasian', 'celtic', 'chamber',
       'children', 'chill', 'chinese', 'choral', 'christian', 'christmas',
       'classical', 'classicmetal', 'club', 'colombian', 'comedy',
       'conjazz', 'contemporary', 'country', 'cuban', 'dance',
       'dancehall', 'dancepop', 'dark', 'death', 'deep', 'deutschrock',
       'deutschspr', 'dirty', 'disco', 'dnb', 'documentary', 'downbeat',
       'downtempo', 'drum', 'dub', 'dubstep', 'eastern', 'easy',
       'electronic', 'electropop', 'emo', 'entehno', 'epicmetal',
       'estrada', 'ethnic', 'eurofo

[Kembali ke Daftar Isi](#back)

### Kesimpulan <a id='data_preprocessing_conclusions'></a>
Kita mendeteksi tiga masalah dengan data:

- Gaya penulisan judul yang salah
- Nilai-nilai yang hilang
- Duplikat eksplisit dan implisit

Judulnya pun sekarang telah dibersihkan untuk mempermudah pemrosesan tabel.
Semua nilai yang hilang telah diganti dengan `'unknown'`. Tapi kita masih harus melihat apakah nilai yang hilang dalam `'genre'` akan memengaruhi perhitungan kita.

Tidak adanya duplikat akan membuat hasil lebih tepat dan lebih mudah dipahami.

Sekarang kita dapat melanjutkan ke pengujian hipotesis.

[Back to Table of Contents](#back)

## Stage 3. Hypothesis Testing <a id='hypotheses'></a>

### Hypothesis 1: Users Activities in Two City <a id='activity'></a>

According to the first hypothesis, users from Springfield and Shelbyville shows difference behaviour when listening to music. This hypothesis using data from: Monday, Wednesday, and Friday.

* Seperate users into groups based on their cities.
* Compare the number of songs played by each group on Monday, Wednesday, and Friday

In [None]:
df.groupby('user_city')['track'].count()

user_city
Shelbyville    18512
Springfield    42741
Name: track, dtype: int64

Spring field has more track played compare to Shelbyville. However, it doesnt mean Springfield citizen listen to music more frequently. This city is bigger and has more users.

In [None]:
df.groupby('day_play')['track'].count()
df_sv = df[df['user_city'] == 'Shelbyville']
df_sv_monday = df_sv[df_sv['day_play'] == 'Monday']

In [None]:
df.groupby('day_play')['track'].count()

day_play
Friday       21840
Monday       21354
Wednesday    18059
Name: track, dtype: int64

Wedenesday is the most silent day overall. However, if we consider both cities seperately, we probably come to a diffent conclusion

In [None]:
def number_tracks(day_play, user_city):
    track_list = df[df['user_city'] == user_city]
    track_list = track_list[track_list['day_play'] == day_play]
    track_list_count = track_list['user_id'].count()
    return track_list_count

In [None]:
number_tracks('Monday', 'Springfield')

15740

In [None]:
number_tracks('Monday', 'Shelbyville')

5614

In [None]:
number_tracks('Wednesday', 'Springfield')

11056

In [None]:
number_tracks('Wednesday', 'Shelbyville')

7003

In [None]:
number_tracks('Friday', 'Springfield')

15945

In [None]:
number_tracks('Friday', 'Shelbyville')

5895

In [None]:
number_tracks = [
    ['Springfield', 
     number_tracks('Monday', 'Springfield'), 
     number_tracks('Wednesday', 'Springfield'), 
     number_tracks('Friday', 'Springfield')],
    ['Shelbyville', 
     number_tracks('Monday', 'Shelbyville'), 
     number_tracks('Wednesday', 'Shelbyville'), 
     number_tracks('Friday', 'Shelbyville')],
]

columns_name = ['city', 'monday', 'wednesday', 'friday']

number_tracks_table = pd.DataFrame(data=number_tracks, columns=columns_name)
number_tracks_table

Unnamed: 0,city,monday,wednesday,friday
0,Springfield,15740,11056,15945
1,Shelbyville,5614,7003,5895


**Conclusions:**

The data reveals differences in user behavior:

- In Springfield, the number of songs played peaks on Mondays and Fridays, while there is a decrease in activity on Wednesdays.
- In Shelbyville, users listen to more music on Wednesdays. User activity is lower on Mondays and Fridays

[Back to Table of Contents](#back)

### Hypothesis 2: Music Preferences on Monday and Friday <a id='week'></a>

According to second hypothesis, On Monday morning and Friday evening, residents of Springfield listen to different genres compared to those enjoyed by residents of Shelbyville.

In [None]:
spr_general = df[df['user_city'] == 'Springfield']
spr_general

Unnamed: 0,index,user_id,track,artist,genre,user_city,time_play,day_play
1,1,55204538,Delayed Because of Accident,Andreas Rönnberg,rock,Springfield,14:07:09,Friday
4,4,E2DC1FAE,Soul People,Space Echo,dance,Springfield,08:34:34,Monday
6,6,4CB90AA5,True,Roman Messer,dance,Springfield,13:00:07,Wednesday
7,7,F03E1C1F,Feeling This Way,Polina Griffith,dance,Springfield,20:47:49,Wednesday
8,8,8FA1D3BE,L’estate,Julia Dalia,ruspop,Springfield,09:17:40,Friday
...,...,...,...,...,...,...,...,...
61247,65073,83A474E7,I Worship Only What You Bleed,The Black Dahlia Murder,extrememetal,Springfield,21:07:12,Monday
61248,65074,729CBB09,My Name,McLean,rnb,Springfield,13:32:28,Wednesday
61250,65076,C5E3A0D5,Jalopiina,unknown,industrial,Springfield,20:09:26,Friday
61251,65077,321D0506,Freight Train,Chas McDevitt,rock,Springfield,21:43:59,Friday


In [None]:
shel_general = df[df['user_city'] == 'Shelbyville']
shel_general

Unnamed: 0,index,user_id,track,artist,genre,user_city,time_play,day_play
0,0,FFB692EC,Kamigata To Boots,The Mass Missile,rock,Shelbyville,20:28:33,Wednesday
2,2,20EC38,Funiculì funiculà,Mario Lanza,pop,Shelbyville,20:58:07,Wednesday
3,3,A3DD03C9,Dragons in the Sunset,Fire + Ice,folk,Shelbyville,08:37:09,Monday
5,5,842029A1,Chains,Obladaet,rusrap,Shelbyville,13:09:41,Friday
9,9,E772D5C0,Pessimist,unknown,dance,Shelbyville,21:20:49,Wednesday
...,...,...,...,...,...,...,...,...
61239,65063,D94F810B,Theme from the Walking Dead,Proyecto Halloween,film,Shelbyville,21:14:40,Monday
61240,65064,BC8EC5CF,Red Lips: Gta (Rover Rework),Rover,electronic,Shelbyville,21:06:50,Monday
61241,65065,29E04611,Bre Petrunko,Perunika Trio,world,Shelbyville,13:56:00,Monday
61242,65066,1B91C621,(Hello) Cloud Mountain,sleepmakeswaves,postrock,Shelbyville,09:22:13,Monday


<div class="alert alert-success">
<b>Reviewer's comment v1</b> <a class="tocSkip"></a>

Semuanya berjalan dengan baik!

</div>

Write a function named genre_weekday() with four parameters:
* A table for data
* Day name
* Starting time stamp, in the format of 'hh:mm'
* Ending time stamp, in the format of 'hh:mm'

The function should provide information about the 15 most popular genres on a specific day during the period between two time stamps.

In [None]:
def genre_weekday(day_play, time1, time2, user_city):
    genre_df = user_city[user_city['day_play'] == day_play]
    genre_df = genre_df[genre_df['time_play'] < time2]
    genre_df = genre_df[genre_df['time_play'] > time1]
    genre_df_grouped = genre_df.groupby('genre')['genre'].count()
    genre_df_sorted = genre_df_grouped.sort_values(ascending = False)
    return genre_df_sorted[:15]

<div class="alert alert-success">
<b>Reviewer's comment v1</b> <a class="tocSkip"></a>

Fungsi telah didefinisikan dengan tepat.

</div>

Compare the results of the genre_weekday() function for Springfield and Shelbyville on Monday morning (from 07:00 to 11:00) and on Friday night (from 17:00 to 23:00):

In [None]:
genre_weekday('Monday', '07:00', '11:00', spr_general)

genre
pop            781
dance          549
electronic     480
rock           474
hiphop         286
ruspop         186
world          181
rusrap         175
alternative    164
unknown        161
classical      157
metal          120
jazz           100
folk            97
soundtrack      95
Name: genre, dtype: int64

In [None]:
genre_weekday('Monday', '07:00', '11:00', shel_general)

genre
pop            218
dance          182
rock           162
electronic     147
hiphop          80
ruspop          64
alternative     58
rusrap          55
jazz            44
classical       40
world           36
rap             32
soundtrack      31
rnb             27
metal           27
Name: genre, dtype: int64

In [None]:
genre_weekday('Friday', '17:00', '23:00', spr_general)

genre
pop            713
rock           517
dance          495
electronic     482
hiphop         273
world          208
ruspop         170
classical      163
alternative    163
rusrap         142
jazz           111
unknown        110
soundtrack     105
rnb             90
metal           88
Name: genre, dtype: int64

In [None]:
genre_weekday('Friday', '07:00', '11:00', shel_general)

genre
pop            211
dance          192
electronic     167
rock           156
hiphop         109
classical       56
alternative     55
rusrap          55
world           46
ruspop          45
metal           42
latin           41
rap             36
rnb             33
jazz            32
Name: genre, dtype: int64

**Conclusions:**

After comparing the top 15 genres on Monday morning, we can draw the following conclusions:

1. Users from Springfield and Shelbyville listen to music with the same genres. The top five genres are the same, with only rock and electronic exchanging places.

2. In Springfield, the amount of missing values is significant, with the value 'unknown' ranked tenth. This means that the missing values have a substantial amount of data, which may raise questions about the accuracy of our conclusions.

For Friday night, the situation is similar. Individual genres are quite varied, but overall, the top 15 genres for both cities are the same.

Thus, the second hypothesis is partially confirmed:

* Users listen to the same music at the beginning and end of the week.
* There is no striking difference between Springfield and Shelbyville. In both cities, pop is the most popular genre.

However, the number of missing values raises questions about these results. In Springfield, so much influences our top 15. If we do not ignore these values, the results may be different.

[Back to Table of Contents](#back)

### Hypothesis 3: Genre Preferences in Springfield and Shelbyville <a id='genre'></a>

Hypothesis 3: Shelbyville loves rap music. In other hand, Springfield prefer pop music.

Group table `spr_general` by genre and find amount of track plated for each genre using `count()` method. Then sort the values by descending and save to `spr_genres`.

In [None]:
spr_genres = spr_general.groupby('genre')['genre'].count()
spr_genres = spr_genres.sort_values(ascending = False)

In [None]:
spr_genres.head(10)# menampilkan 10 baris pertama dari spr_genres

genre
pop            5892
dance          4435
rock           3965
electronic     3786
hiphop         2096
classical      1616
world          1432
alternative    1379
ruspop         1372
rusrap         1161
Name: genre, dtype: int64

Do same thing for Shelbyville Data

In [None]:
shel_genres = shel_general.groupby('genre')['genre'].count()
shel_genres = shel_genres.sort_values(ascending = False)

In [None]:
shel_genres.head(10)

genre
pop            2431
dance          1932
rock           1879
electronic     1736
hiphop          960
alternative     649
classical       646
rusrap          564
ruspop          538
world           515
Name: genre, dtype: int64

**Kesimpulan**

The hypothesis is partially confirmed:

* Pop music is the most popular genre in Springfield, as expected.
* However, pop music is equally popular in both Springfield and Shelbyville, and rap music does not make it to the top 5 for both cities

[Back to Table of Contents](#back)

# Findings <a id='end'></a>

We have tested the following three hypotheses:

1. User activity varies depending on the day and city.
2. On Monday morning, Springfield and Shelbyville residents listen to different genres. This also applies to Friday night.
3. Listeners in Springfield and Shelbyville have different preferences. Both Springfield and Shelbyville prefer pop music.

After analyzing the data, we can conclude:

1. User activity in Springfield and Shelbyville depends on the day, and the cities.

    The first hypothesis can be fully accepted.


2. Music preferences are not significantly different during the week in Springfield and Shelbyville. We can see a small difference in the ranking on Monday, but both Springfield and Shelbyville, people listen to pop music the most.

    So this hypothesis cannot be accepted. We should also remember that the results may be different if not for the missing values.


3. It turns out that the music preferences of users from Springfield and Shelbyville are very similar.

    The third hypothesis is rejected. If there were differences in preferences, they could not be seen from this data.
    
Based on the analysis, the marketing team can consider the following recommendations:

1. Since user activity varies depending on the day, the marketing team can plan their campaigns accordingly. For instance, they can run ads on Monday mornings targeting Springfield and Shelbyville residents based on their respective music genres.

2. As pop music is the most preferred genre in both Springfield and Shelbyville, the marketing team can focus on promoting pop music-related products or services in these cities. They can also consider partnering with popular pop music artists to attract more listeners.

3. The marketing team should be cautious while making assumptions about user preferences based on the limited data available. They can conduct more research to obtain a more comprehensive understanding of user behavior and preferences.

[Back to Table of Contents](#back)