## Spotify Charts with Spark
First off you will need to go to https://www.kaggle.com/datasets/dhruvildave/spotify-charts and where it shows charts.csv 3.48GB download it from there it is large, but the data is cut down during this project and since Apache Spark is being used it doesn't take to long to load in. Once it is downloaded and you have to extract it from its zip file and then put the csv file in the same folder as this file you can run each command individually.

In [None]:
import pyspark
spark = pyspark.sql.SparkSession.builder.getOrCreate()
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, FloatType, TimestampType, ArrayType, DateType
from pyspark.sql.functions import sum_distinct
from pyspark.sql import functions as F
from pyspark.sql.window import Window
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
sc = spark.sparkContext

In [None]:
schema = StructType([
    StructField('title', StringType(), True),
    StructField('rank', IntegerType(), True),
    StructField('date', StringType(), True),
    StructField('artist', StringType(), True),
    StructField('url', StringType(), True),
    StructField('region', StringType(), True),
    StructField('chart', StringType(), True),
    StructField('trend', StringType(), True),
    StructField('streams', IntegerType(), True)
])

In [None]:
df = spark.read.csv('charts.csv', header=True, schema=schema, timestampFormat='yyyy-MM-dd')

#These DataFrames are used to find the top 200 chart also find the ten countries: United States, Switzerland, Australia, Brazil, Germany, United Kingdom, Sweden, Austria, Uruguay, and Chile
#This will also go through the commands and find the month and year choice as well as year choice
dim = df.filter((df.chart == 'top200') & ((df.region == 'United States') | (df.region == 'Switzerland') | (df.region == 'Australia') | (df.region == 'Brazil') | (df.region == 'Germany') | (df.region == 'United Kingdom') | (df.region == 'Sweden') | (df.region == 'Austria') | (df.region == 'Uruguay') | (df.region == 'Chile')))
df_month = dim.withColumn('month', F.date_format(F.col("date"), "yyyy-MM"))
df_year = df_month.withColumn('year', F.date_format(F.col("month"), "yyyy"))

#These DataFrames are the global variables that are going to be used to compare the values to the choosen 10 countries
#This will also go through the commands and find the month and year choice as well as year choice
glob = df.filter((df.chart == 'top200') & (df.region == 'Global'))
df_glob_month = glob.withColumn('month', F.date_format(F.col("date"), "yyyy-MM"))
df_glob_year = glob.withColumn('year', F.date_format(F.col("date"), "yyyy"))

In [None]:
dim.filter((dim.title == 'Bad and Boujee (feat. Lil Uzi Vert)') & (dim.region == 'United States') & (dim.artist == 'Migos') & (dim.date.startswith('2017-01'))).show(10)

### The Sum
This section has three parts with similar pieces. The first part goes off of the the countries, The second part is the checker which sums the ten country values, The third part is the global value. Each of the parts have 4 pieces all of the pieces look to find the summed total number of streams. The first piece goes based on the title of the song and the month, The second piece goes off of the title of the song and the year, third is based on the artist of the song and month, and lastly it is artist of the song and year.

In [None]:
#1 Country
#This DataFrame requires the title, artist, month, region to summarize the streams and then to find the total_streams for each region
month = df_month.groupBy(['title','artist','month','region']).agg(F.sum('streams').alias('total_streams'))
#This DataFrame is then modified so months set as ascending and total_streams are set to descending
month = month.orderBy(month.month.asc(),month.total_streams.desc())

In [None]:
#2 Country
#This DataFrame requires the title, artist, year, region to summarize the streams and then to find the total_streams for each region
year = df_year.groupBy(['title','artist','year','region']).agg(F.sum('streams').alias('total_streams'))
#This DataFrame is then modified so years set as ascending and total_streams are set to descending
year = year.orderBy(year.year.asc(),year.total_streams.desc())

In [None]:
#3 Country
#This DataFrame requires the artist, month, region to summarize the streams and then to find the total_streams for each region
artist_month = df_month.groupBy(['artist','month','region']).agg(F.sum('streams').alias('total_streams'))
#This DataFrame is then modified so months set as ascending and total_streams are set to descending
artist_month = artist_month.orderBy(artist_month.month.asc(), artist_month.total_streams.desc())

In [None]:
#4 Country
#This DataFrame requires the artist, year, region to summarize the streams and then to find the total_streams for each region
artist_year = df_year.groupBy(['artist','year','region']).agg(F.sum('streams').alias('total_streams'))
#This DataFrame is then modified so years set as ascending and total_streams are set to descending
artist_year = artist_year.orderBy(artist_year.year.asc(), artist_year.total_streams.desc())

In [None]:
#1 The Checker for the Song Title by Month
#This DataFrame requires the title, artist, month to summarize the streams and then to find the total_streams to check against the global value
chk_month = df_month.groupBy(['title','artist','month']).agg(F.sum('streams').alias('total_streams'))
#This DataFrame is then modified so months set as ascending and total_streams are set to descending
chk_month = chk_month.orderBy(chk_month.month.asc(), chk_month.total_streams.desc())

In [None]:
#2 The Checker for the Song Title by Year
#This DataFrame requires the title, artist, year to summarize the streams and then to find the total_streams to check against the global value
chk_year = df_year.groupBy(['title','artist','year']).agg(F.sum('streams').alias('total_streams'))
#This DataFrame is then modified so years set as ascending and total_streams are set to descending
chk_year = chk_year.orderBy(chk_year.year.asc(), chk_year.total_streams.desc())

In [None]:
#3 The Checker for the Artist by Month
#This DataFrame requires the artist, month to summarize the streams and then to find the total_streams to check against the global value
artist_chk_month = df_month.groupBy(['artist','month']).agg(F.sum('streams').alias('total_streams'))
#This DataFrame is then modified so months set as ascending and total_streams are set to descending
artist_chk_month = artist_chk_month.orderBy(artist_chk_month.month.asc(), artist_chk_month.total_streams.desc())

In [None]:
#4 The Checker for the Artist by Year
#This DataFrame requires the artist, year to summarize the streams and then to find the total_streams to check against the global value
artist_chk_year = df_year.groupBy(['artist','year']).agg(F.sum('streams').alias('total_streams'))
#This DataFrame is then modified so years set as ascending and total_streams are set to descending
artist_chk_year = artist_chk_year.orderBy(artist_chk_year.year.asc(), artist_chk_year.total_streams.desc())

In [None]:
#1G
#This DataFrame requires the title, artist, month, region to summarize the streams and then to find the total_streams for the global value
glob_month = df_glob_month.groupBy(['title','artist','month','region']).agg(F.sum('streams').alias('total_streams'))
#This DataFrame is then modified so month set as ascending and total_streams are set to descending
glob_month = glob_month.orderBy(glob_month.month.asc(), glob_month.total_streams.desc())

In [None]:
#2G
#This DataFrame requires the title, artist, year, region to summarize the streams and then to find the total_streams for the global value
glob_year = df_glob_year.groupBy(['title','artist','year','region']).agg(F.sum('streams').alias('total_streams'))
#This DataFrame is then modified so years set as ascending and total_streams are set to descending
glob_year = glob_year.orderBy(glob_year.year.asc(), glob_year.total_streams.desc())

In [None]:
#3G
#This DataFrame requires the artist, month, region to summarize the streams and then to find the total_streams for the global value
artist_glob_month = df_glob_month.groupBy(['artist','month','region']).agg(F.sum('streams').alias('total_streams'))
#This DataFrame is then modified so month set as ascending and total_streams are set to descending
artist_glob_month = artist_glob_month.orderBy(artist_glob_month.month.asc(), artist_glob_month.total_streams.desc())

In [None]:
#4G
#This DataFrame requires the artist, year, region to summarize the streams and then to find the total_streams for the global value
artist_glob_year = df_glob_year.groupBy(['artist','year','region']).agg(F.sum('streams').alias('total_streams'))
#This DataFrame is then modified so years set as ascending and total_streams are set to descending
artist_glob_year = artist_glob_year.orderBy(artist_glob_year.year.asc(), artist_glob_year.total_streams.desc())

### The Sort
This section goes through the data and sorts it where the date and region are in ascending order while the total_streams are in descending order. There are three sections same as the previous section each have comments showing which one is which. There are also two dataframes that are created one for the top 20 and one for the top 5. 

In [None]:
#1 Country
win_month = Window.partitionBy(month['month'],month['region']).orderBy(month['total_streams'].desc())
topTwoMonth = month.select('*', F.rank().over(win_month).alias('rank')).filter(F.col('rank') <= 20)
topFiveMonth = month.select('*', F.rank().over(win_month).alias('rank')).filter(F.col('rank') <= 5)

In [None]:
#2 Country
win_year = Window.partitionBy(year['year'],year['region']).orderBy(year['total_streams'].desc())
topTwoYear = year.select('*', F.rank().over(win_year).alias('rank')).filter(F.col('rank') <= 20)
topFiveYear = year.select('*', F.rank().over(win_year).alias('rank')).filter(F.col('rank') <= 5)

In [None]:
#3 Country
artist_win_month = Window.partitionBy(artist_month['month'],artist_month['region']).orderBy(artist_month['total_streams'].desc())
topTwoArtistMonth = artist_month.select('*', F.rank().over(artist_win_month).alias('rank')).filter(F.col('rank') <= 20)
topFiveArtistMonth = artist_month.select('*', F.rank().over(artist_win_month).alias('rank')).filter(F.col('rank') <= 5)

In [None]:
#4 Country
artist_win_year = Window.partitionBy(artist_year['year'],artist_year['region']).orderBy(artist_year['total_streams'].desc())
topTwoArtistYear = artist_year.select('*', F.rank().over(artist_win_year).alias('rank')).filter(F.col('rank') <= 20)
topFiveArtistYear = artist_year.select('*', F.rank().over(artist_win_year).alias('rank')).filter(F.col('rank') <= 5)

In [None]:
#1 The Checker for the Song Title by Month
win_chk_month = Window.partitionBy(chk_month['month']).orderBy(chk_month['total_streams'].desc())
topTwoChkMonth = chk_month.select('*', F.rank().over(win_chk_month).alias('rank')).filter(F.col('rank') <= 20)
topFiveChkMonth = chk_month.select('*', F.rank().over(win_chk_month).alias('rank')).filter(F.col('rank') <= 5)

In [None]:
#2 The Checker for the Song Title by Year
win_chk_year = Window.partitionBy(chk_year['year']).orderBy(chk_year['total_streams'].desc())
topTwoChkYear = chk_year.select('*', F.rank().over(win_chk_year).alias('rank')).filter(F.col('rank') <= 20)
topFiveChkYear = chk_year.select('*', F.rank().over(win_chk_year).alias('rank')).filter(F.col('rank') <= 5)

In [None]:
#3 The Checker for the Artist by Month
artist_win_chk_month = Window.partitionBy(artist_chk_month['month']).orderBy(artist_chk_month['total_streams'].desc())
topTwoArtistChkMonth = artist_chk_month.select('*', F.rank().over(artist_win_chk_month).alias('rank')).filter(F.col('rank') <= 20)
topFiveArtistChkMonth = artist_chk_month.select('*', F.rank().over(artist_win_chk_month).alias('rank')).filter(F.col('rank') <= 5)

In [None]:
#4 The Checker for the Artist by Year 
artist_win_chk_year = Window.partitionBy(artist_chk_year['year']).orderBy(artist_chk_year['total_streams'].desc())
topTwoArtistChkYear = artist_chk_year.select('*', F.rank().over(artist_win_chk_year).alias('rank')).filter(F.col('rank') <= 20)
topFiveArtistChkYear = artist_chk_year.select('*', F.rank().over(artist_win_chk_year).alias('rank')).filter(F.col('rank') <=  5)

In [None]:
#1 Global
glob_win_month = Window.partitionBy(glob_month['month'],glob_month['region']).orderBy(glob_month['total_streams'].desc())
topTwoGlobMonth = glob_month.select('*', F.rank().over(glob_win_month).alias('rank')).filter(F.col('rank') <= 20)
topFiveGlobMonth = glob_month.select('*', F.rank().over(glob_win_month).alias('rank')).filter(F.col('rank') <= 5)

In [None]:
#2 Global
glob_win_year = Window.partitionBy(glob_year['year'], glob_year['region']).orderBy(glob_year['total_streams'].desc())
topTwoGlobYear = glob_year.select('*', F.rank().over(glob_win_year).alias('rank')).filter(F.col('rank') <= 20)
topFiveGlobYear = glob_year.select('*', F.rank().over(glob_win_year).alias('rank')).filter(F.col('rank') <= 5)

In [None]:
#3 Global
artist_glob_win_month = Window.partitionBy(artist_glob_month['month'], artist_glob_month['region']).orderBy(artist_glob_month['total_streams'].desc())
topTwoGlobArtistMonth = artist_glob_month.select('*', F.rank().over(artist_glob_win_month).alias('rank')).filter(F.col('rank') <= 20)
topFiveGlobArtistMonth = artist_glob_month.select('*', F.rank().over(artist_glob_win_month).alias('rank')).filter(F.col('rank') <= 5)

In [None]:
#4 Global
artist_glob_win_year = Window.partitionBy(artist_glob_year['year'], artist_glob_year['region']).orderBy(artist_glob_year['total_streams'].desc())
topTwoGlobArtistYear = artist_glob_year.select('*', F.rank().over(artist_glob_win_year).alias('rank')).filter(F.col('rank') <= 20)
topFiveGlobArtistYear = artist_glob_year.select('*', F.rank().over(artist_glob_win_year).alias('rank')).filter(F.col('rank') <= 5)

### The Pandas
As before this has 3 sections distinct for country, global, and 10 country summed generated value. The parts for the three sections use the top 5 and top 20 dataframes that were generated and turns them into a Pandas dataframe so they can be printed out as a Bar graph. The top 5 pandas for each part are used to create a bar graph showing all of the top 5 for each song and artist for each month and year. Since these are large top 5 graphs they are saved out to a jpg file so they are easier to read and understand.

In [None]:
#1 Country
tfMonth = topFiveMonth.limit(10).toPandas()
ttMonthAll = topTwoMonth.toPandas()
tfMonthAll = topFiveMonth.toPandas()

In [None]:
tfmaAx = tfMonthAll.plot(kind='bar',x='title',y='total_streams',figsize=(500,10))
tfmaFig = tfmaAx.get_figure()
tfmaFig.savefig('TitleMonthNum1.jpg')

In [None]:
#2 Country
tfYear = topFiveYear.limit(10).toPandas()
ttYearAll = topTwoYear.toPandas()
tfYearAll = topFiveYear.toPandas()

In [None]:
tfyaAx = tfYearAll.plot(kind='bar',x='title',y='total_streams',figsize=(200,10))
tfyaFig = tfyaAx.get_figure()
tfyaFig.savefig('TitleYearNum2.jpg')

In [None]:
#3 Country
tfaMonth = topFiveArtistMonth.limit(10).toPandas()
ttaMonthAll = topTwoArtistMonth.toPandas()
tfaMonthAll = topFiveArtistMonth.toPandas()

In [None]:
tfamaAx = tfaMonthAll.plot(kind='bar',x='artist',y='total_streams',figsize=(500,10))
tfamaFig = tfamaAx.get_figure()
tfamaFig.savefig('ArtistMonthNum3.jpg')

In [None]:
#4 Country
tfaYear = topFiveArtistYear.limit(10).toPandas()
ttaYearAll = topTwoArtistYear.toPandas()
tfaYearAll = topFiveArtistYear.toPandas()

In [None]:
tfayaAx = tfaYearAll.plot(kind='bar',x='artist',y='total_streams',figsize=(280,10))
tfayaFig = tfayaAx.get_figure()
tfayaFig.savefig('ArtistYearNum4.jpg')

In [None]:
#1 The Checker
tfcMonth = topFiveChkMonth.limit(10).toPandas()
ttcMonthAll = topTwoChkMonth.toPandas()
tfcMonthAll = topFiveChkMonth.toPandas()

In [None]:
tfcmaAx = tfcMonthAll.plot(kind='bar',x='title',y='total_streams',figsize=(500,10))
tfcmaFig = tfcmaAx.get_figure()
tfcmaFig.savefig('CheckMonthNum1.jpg')

In [None]:
#2 The Checker 
tfcYear = topFiveChkYear.limit(10).toPandas()
ttcYearAll = topTwoChkYear.toPandas()
tfcYearAll = topFiveChkYear.toPandas()

In [None]:
tfcyaAx = tfcYearAll.plot(kind='bar',x='title',y='total_streams',figsize=(50,10))
tfcyaFig = tfcyaAx.get_figure()
tfcyaFig.savefig('CheckYearNum2.jpg')

In [None]:
#3 The Checker
tfacMonth = topFiveArtistChkMonth.limit(10).toPandas()
ttacMonthAll = topTwoArtistChkMonth.toPandas()
tfacMonthAll = topFiveArtistChkMonth.toPandas()

In [None]:
tfacmaAx = tfacMonthAll.plot(kind='bar',x='artist',y='total_streams',figsize=(300,10))
tfacmaFig = tfacmaAx.get_figure()
tfacmaFig.savefig('ArtistCheckMonthNum3.jpg')

In [None]:
#4 The Checker
tfacYear = topFiveArtistChkYear.limit(10).toPandas()
ttacYearAll = topTwoArtistChkYear.toPandas()
tfacYearAll = topFiveArtistChkYear.toPandas()

In [None]:
tfacyaAx = tfacYearAll.plot(kind='bar',x='artist',y='total_streams',figsize=(50,10))
tfacyaFig = tfacyaAx.get_figure()
tfacyaFig.savefig('ArtistCheckYearNum4.jpg')

In [None]:
#1 Global
tfgMonth = topFiveGlobMonth.limit(10).toPandas()
ttgMonthAll = topTwoGlobMonth.toPandas()
tfgMonthAll = topFiveGlobMonth.toPandas()

In [None]:
tfgmaAx = tfgMonthAll.plot(kind='bar',x='title',y='total_streams',figsize=(500,10))
tfgmaFig = tfgmaAx.get_figure()
tfgmaFig.savefig('GlobalMonthNum1.jpg')

In [None]:
#2 Global
tfgYear = topFiveGlobYear.limit(10).toPandas()
ttgYearAll = topTwoGlobYear.toPandas()
tfgYearAll = topFiveGlobYear.toPandas()

In [None]:
tfgyaAx = tfgYearAll.plot(kind='bar',x='title',y='total_streams',figsize=(50,10))
tfgyaFig = tfgyaAx.get_figure()
tfgyaFig.savefig('GlobalYearNum2.jpg')

In [None]:
#3 Global
tfgaMonth = topFiveGlobArtistMonth.limit(10).toPandas()
ttgaMonthAll = topTwoGlobArtistMonth.toPandas()
tfgaMonthAll = topFiveGlobArtistMonth.toPandas()

In [None]:
tfgamaAx = tfgaMonthAll.plot(kind='bar',x='artist',y='total_streams',figsize=(300,10))
tfgamaFig = tfgamaAx.get_figure()
tfgamaFig.savefig('ArtistGlobalMonthNum3.jpg')

In [None]:
#4 Global
tfgaYear = topFiveGlobArtistYear.limit(10).toPandas()
ttgaYearAll = topTwoGlobArtistYear.toPandas()
tfgaYearAll = topFiveGlobArtistYear.toPandas()

In [None]:
tfgayaAx = tfgaYearAll.plot(kind='bar',x='artist',y='total_streams',figsize=(50,10))
tfgayaFig = tfgayaAx.get_figure() #('TitleFiveGlobalArtistYearAll.pdf')
tfgayaFig.savefig('ArtistGlobalYearNum4.jpg')

### To view all of the Top Twenty Songs and Artists
You will have to put in exactly how I display meaning if it says (yes/no) you and you want yes you will have to spell it exactly because otherwise it will restart it. The value is used to help determine which dataframe it needs to go to as well as specifically for which country and date they are used directly in the dataframe, so it will be able to display the correct bar graph.

There is a issue currently when it needs to display the bar chart it only displays once you end the loop, but you can continue adding graphs it will take a few seconds once you end the loop for all of the charts to display.

In [None]:
cont_y = input('Would you like to Start (yes/no)? ')
while cont_y != 'no':
    mth_yr = ''
    date = ''
    comp = ''
    country = ''
    artle = ''
    glob_count = input('Are you looking for country or global? ')
    if glob_count == 'country':
        print('The Possible Countries are: United States, Switzerland, Australia, Brazil, Germany, United Kingdom, Sweden, Austria, Uruguay, or Chile')
        country = input('What Country would you like? ')
    else:
        if glob_count == 'global':
            comp = input('Would you like to compare? (yes/no)')
    mth_yr = input('Would you rather go by month or year? ')
    if mth_yr == 'month':
        print('To enter the month write as yyyy-MM')
        print('You can access from 2017-01 to 2021-12')
        date = input('Enter the month: ')
    elif mth_yr == 'year':
        print('To enter the year write as yyyy')
        print('You can access from 2017 to 2021')
        date = input('Enter the year: ')
    else:
        print('Please Try again')
        continue
    artle = input('Would you like to search by title or artist? ')
    if (artle == 'title') & (country !=  '') & (mth_yr == 'month'):
        ttMonthAll.loc[(ttMonthAll['month'] == date) & (ttMonthAll['region'] == country)].plot(kind='bar',x='title',y='total_streams',figsize=(20,10))
    elif (artle == 'artist') & (country != '') & (mth_yr == 'month'):
        ttaMonthAll.loc[(ttaMonthAll['month'] == date) & (ttaMonthAll['region'] == country)].plot(kind='bar',x='artist',y='total_streams',figsize=(20,10))
    elif (artle == 'title') & (country != '') & (mth_yr == 'year'):
        ttYearAll.loc[(ttYearAll['year'] == date) & (ttYearAll['region'] == country)].plot(kind='bar',x='title',y='total_streams',figsize=(20,10))
    elif (artle == 'artist') & (country != '') & (mth_yr == 'year'):
        ttaYearAll.loc[(ttaYearAll['year'] == date) & (ttaYearAll['region'] == country)].plot(kind='bar',x='artist',y='total_streams',figsize=(20,10))
    elif (artle == 'title') & (glob_count == 'global') & (mth_yr == 'month'):
        if comp == 'yes':
            ttgMonthAll.loc[(ttgMonthAll['month'] == date)].plot(kind='bar',title='Global Value',x='title',y='total_streams',figsize=(20,10))
            ttcMonthAll.loc[(ttcMonthAll['month'] == date)].plot(kind='bar',title='Ten Countries Value',x='title',y='total_streams',figsize=(20,10))
        elif comp == 'no':
            ttgMonthAll.loc[(ttgMonthAll['month'] == date)].plot(kind='bar',x='title',y='total_streams',figsize=(20,10))
        else:
            print('Please Try again')
            continue
    elif (artle == 'artist') & (glob_count == 'global') & (mth_yr == 'month'):
        if comp == 'yes':
            ttgaMonthAll.loc[(ttgaMonthAll['month'] == date)].plot(kind='bar',title='Global Value',x='artist',y='total_streams',figsize=(20,10))
            ttacMonthAll.loc[(ttacMonthAll['month'] == date)].plot(kind='bar',title='Ten Countries Value',x='artist',y='total_streams',figsize=(20,10))
        elif comp == 'no':
            ttgaMonthAll.loc[(ttgaMonthAll['month'] == date)].plot(kind='bar',x='artist',y='total_streams',figsize=(20,10))
        else:
            print('Please Try again')
            continue
    elif (artle == 'title') & (glob_count == 'global') & (mth_yr == 'year'):
        if comp == 'yes':
            ttgYearAll.loc[(ttgYearAll['year'] == date)].plot(kind='bar',title='Global Value',x='title',y='total_streams',figsize=(20,10))
            ttcYearAll.loc[(ttcYearAll['year'] == date)].plot(kind='bar',title='Ten Countries Value',x='title',y='total_streams',figsize=(20,10))
        elif comp == 'no':
            ttgYearAll.loc[(ttgYearAll['year'] == date)].plot(kind='bar',x='title',y='total_streams',figsize=(20,10))
        else:
            print('Please Try again')
            continue
    elif (artle == 'artist') & (glob_count == 'global') & (mth_yr == 'year'):
        if comp == 'yes':
            ttgaYearAll.loc[(ttgaYearAll['year'] == date)].plot(kind='bar',title='Global Value',x='artist',y='total_streams',figsize=(20,10))
            ttacYearAll.loc[(ttacYearAll['year'] == date)].plot(kind='bar',title='Ten Countries Value',x='artist',y='total_streams',figsize=(20,10))
        elif comp == 'no':
            ttgaYearAll.loc[(ttgaYearAll['year'] == date)].plot(kind='bar',x='artist',y='total_streams',figsize=(20,10))
        else:
            print('Please Try again')
            continue
    else:
        print("A value that you entered ")
        continue
    print('You can continue as much as you like but viewing the top 20 in a bar graph you will have to choose no when asked to Continue.')
    cont_y = input('Would you like to Continue (yes/no)? ')