## 1.Storing Data

* If we wanted to save the data from the table, we could use `two lists` or maybe a `list of lists.`

In [2]:
content_ratings=['4+','9+','12+','17+']
numbers=[4433,987,1155,622]
content_rating_numbers=[['4+','9+','12+','17+'],[4433,987,1155,622]]

## 2.Dictionaries

* Mapped each content rating to its corresponding number by following an `index:value` pattern.

In [3]:
content_ratings={'4+':4433,'9+':987,'12+':1155,'17+':622}
print(content_ratings)

{'4+': 4433, '9+': 987, '12+': 1155, '17+': 622}


## 3.  Indexing
* `variable_name[index]`

In [6]:
content_ratings['17+']

622

## 4. Alternative Way of Creating a Dictionary
* `dictionary_name[index] = value`

In [8]:
content_ratings={}
content_ratings['4+']=4433
content_ratings['9+']=987
content_ratings['12+']=1155
content_ratings['17+']=622
over_12_n_apps=content_ratings['12+']
content_ratings

{'4+': 4433, '9+': 987, '12+': 1155, '17+': 622}

## 5.Key-Value Pairs

![image.png](attachment:image.png)

In [10]:
d_1={'key_1': 'first_value', 
 'key_2': 2,
 'key_3': 3.14,                          # value can be int,float,string,dict,list
 'key_4': True,
 'key_5': [4,2,1],
 'key_6': {'inner_key' : 6}}
d_1

{'key_1': 'first_value',
 'key_2': 2,
 'key_3': 3.14,
 'key_4': True,
 'key_5': [4, 2, 1],
 'key_6': {'inner_key': 6}}

## 6.Checking for Membership
An expression of the form **a_value in a_dictionary** always returns a Boolean value:

* True is returned if a_value exists in a_dictionary as a dictionary key.
* False is returned if a_value doesn't exist in a_dictionary as a dictionary key.

In [12]:
content_ratings = {'4+': 4433, '9+': 987, '12+': 1155, '17+': 622}
is_in_dictionary_1= '9+' in content_ratings
is_in_dictionary_2=987 in content_ratings
if '17+' in content_ratings:
    result="It exists"
print(result)

It exists


## 7. Counting with Dictionaries

In [17]:
opened_file = open('C:/Users/krishna/Desktop/JUPYTE~1/Data quest/Step1I~1/PYTHON~1/AppleStore.csv',encoding='utf-8')
from csv import reader
read_file = reader(opened_file)
apps_data = list(read_file)
print(apps_data[:2])

content_ratings={'4+': 0, '9+': 0, '12+': 0, '17+': 0}

for app in apps_data[1:]:
    c_rating=app[10]
    if c_rating in content_ratings:
        content_ratings[c_rating]+=1
print(content_ratings)

[['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'], ['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']]
{'4+': 4433, '9+': 987, '12+': 1155, '17+': 622}


## 8. Finding the Unique Values

Previously, we created the dictionary {'4+': 0, '9+': 0, '12+': 0, '17+': 0} before we looped over the data set to count the occurrence of each content rating. Unfortunately, this approach requires us to know beforehand the unique values we want to count.

Let's say we didn't know what the unique content ratings are. This means that we don't have enough information to create the dictionary {'4+': 0, '9+': 0, '12+': 0, '17+': 0}. We need to devise a way to extract this information.

In [18]:

content_ratings={}

for app in apps_data[1:]:
    c_rating=app[10]
    if c_rating in content_ratings:
        content_ratings[c_rating] +=1
    else:
        content_ratings[c_rating]=1
print(content_ratings)

{'4+': 4433, '12+': 1155, '9+': 987, '17+': 622}


## 9. Proportions and Percentages

* What proportion of apps have a content rating of 4+?
* What percentage of apps have a content rating of 17+?
* What percentage of apps can a 15-year-old download?

In [21]:
genre_counting={}

for app in apps_data[1:]:
    genre=app[11]
    if genre in genre_counting:
        genre_counting[genre]+=1
    else:
        genre_counting[genre]=1
print(genre_counting)    

{'Social Networking': 167, 'Photo & Video': 349, 'Games': 3862, 'Music': 138, 'Reference': 64, 'Health & Fitness': 180, 'Weather': 72, 'Utilities': 248, 'Travel': 81, 'Shopping': 122, 'News': 75, 'Navigation': 46, 'Lifestyle': 144, 'Entertainment': 535, 'Food & Drink': 63, 'Sports': 114, 'Book': 112, 'Finance': 104, 'Education': 453, 'Productivity': 178, 'Business': 57, 'Catalogs': 10, 'Medical': 23}


## 10.Looping over Dictionaries

In [24]:
# What percentage of apps can a 15-year-old download?

content_ratings = {'4+': 4433, '12+': 1155, '9+': 987, '17+': 622}
total_number_of_apps = 7197
for key in content_ratings:
    content_ratings[key] /=total_number_of_apps
    content_ratings[key] *=100
percentage_17_plus=content_ratings['17+']
percentage_15_allowed=content_ratings['4+']+content_ratings['12+'] + content_ratings['9+']
print("What percentage of apps can a 15-year-old download:",percentage_15_allowed)
print("What percentage of apps have a content rating of 17+ is ",percentage_17_plus)

What percentage of apps can a 15-year-old download: 91.35751007364179
What percentage of apps have a content rating of 17+ is  8.642489926358204


## 11. Keeping the Dictionaries Separate

we might want to have three separate dictionaries: one storing frequencies, another storing proportions, and another storing percentages.

In [28]:
content_ratings = {'4+': 4433, '12+': 1155, '9+': 987, '17+': 622}
total_number_of_apps = 7197
c_ratings_proportions={}
c_ratings_percentages={}
for key in content_ratings:
    c_ratings_proportions[key]=content_ratings[key]/total_number_of_apps
    c_ratings_percentages[key]=c_ratings_proportions[key]*100
print(c_ratings_proportions)
print(c_ratings_percentages)

{'4+': 0.6159510907322495, '12+': 0.16048353480616923, '9+': 0.13714047519799916, '17+': 0.08642489926358204}
{'4+': 61.595109073224954, '12+': 16.04835348061692, '9+': 13.714047519799916, '17+': 8.642489926358204}


## 12.. Frequency Tables for Numerical Columns

* Creating frequency tables for certain columns may result in creating lengthy dictionaries because of the large number of unique values. 
* A lengthy frequency table is difficult to analyze. The lengthier the table, the harder it becomes to see any patterns.
* Using intervals helps us segment the data into groups, which eases analysis. 

In [30]:
# finding min max values to decide interval
data_sizes=[]
for row in apps_data[1:]:
    size=float(row[2])
    data_sizes.append(size)
min_size=min(data_sizes)
max_size=max(data_sizes)
print(min_size)
print(max_size)

589824.0
4025969664.0


## 13. Filtering for the Intervals

We want to store the frequency table as a dictionary. We begin by creating a dictionary with the intervals as dictionary keys and frequencies as dictionary values (we initialize all frequencies with zero):

In [31]:
user_ratings_freq = {'0 - 10000': 0, '10000 - 100000': 0, '100000 - 500000': 0,
                    '500000 - 1000000': 0, '1000000+': 0}

for row in apps_data[1:]:
    user_ratings = int(row[5])
    
    if user_ratings <= 10000:
        user_ratings_freq['0 - 10000'] += 1
        
    elif 10000 < user_ratings <= 100000:
        user_ratings_freq['10000 - 100000'] += 1
        
    elif 100000 < user_ratings <= 500000:
        user_ratings_freq['100000 - 500000'] += 1
        
    elif 500000 < user_ratings <= 1000000:
        user_ratings_freq['500000 - 1000000'] += 1
        
    elif user_ratings > 1000000:
        user_ratings_freq['1000000+'] += 1


In this mission, we learned about dictionaries and focused on how to use them to build frequency tables. Frequency tables are common in data science practice