# V3: Bringing it all together for User Defined Functions:

* We have learned how we can pass values and how functions will return them using Tuples.



## Example 1: Bringing it all together (1)
You've got your first taste of writing your own functions in the previous exercises. You've learned how to add parameters to your own function definitions, return a value or multiple values with tuples, and how to call the functions you've defined.

In this and the following exercise, you will bring together all these concepts and apply them to a simple data science problem. You will load a dataset and develop functionalities to extract simple insights from the data.

For this exercise, your goal is to recall how to load a dataset into a DataFrame. The dataset contains Twitter data and you will iterate over entries in a column to build a dictionary in which the keys are the names of languages and the values are the number of tweets in the given language. The file tweets.csv is available in your current directory.

Be aware that this is real data from Twitter and as such there is always a risk that it may contain profanity or other offensive content (in this exercise, and any following exercises that also use real Twitter data).

### Steps:

1. Import the pandas package with the alias pd.
2. Import the file 'tweets.csv' using the pandas function read_csv(). Assign the resulting DataFrame to df.
3. Complete the for loop by iterating over col, the 'lang' column in the DataFrame df.
4. Complete the bodies of the if-else statements in the for loop: if the key is in the dictionary langs_count, add 1 to the value corresponding to this key in the dictionary, else add the key to langs_count and set the corresponding value to 1. Use the loop variable entry in your code.

In [1]:
# Import pandas
import pandas as pd

# Import Twitter data as DataFrame: df
df = pd.read_csv('tweets.csv')

# Initialize an empty dictionary: langs_count
langs_count = {}

# Extract column from DataFrame: col
col = df['lang']

# Iterate over lang column in DataFrame
for entry in col:
    # print(langs_count.keys(), '  -   ', langs_count.values())
    
    # If the language is in langs_count, add 1 
    if entry in langs_count.keys():
        langs_count[entry] = langs_count[entry] +1
    # Else add the language to langs_count, set the value to 1
    else:
        langs_count[entry] = 1
        

# Print the populated dictionary
print(langs_count)

{'en': 97, 'et': 1, 'und': 2}


## Example 2: Bringing it all together (2)
Great job! You've now defined the functionality for iterating over entries in a column and building a dictionary with keys the names of languages and values the number of tweets in the given language.

In this exercise, you will define a function with the functionality you developed in the previous exercise, return the resulting dictionary from within the function, and call the function with the appropriate arguments.

For your convenience, the pandas package has been imported as pd and the 'tweets.csv' file has been imported into the tweets_df variable.

### Steps: 

1. Define the function count_entries(), which has two parameters. The first parameter is df for the DataFrame and the second is col_name for the column name.
2. Complete the bodies of the if-else statements in the for loop: if the key is in the dictionary langs_count, add 1 to its current value, else add the key to langs_count and set its value to 1. Use the loop variable entry in your code.
3. Return the langs_count dictionary from inside the count_entries() function.
4. Call the count_entries() function by passing to it tweets_df and the name of the column, 'lang'. Assign the result of the call to the variable result.

In [10]:
import pandas as pd
tweets_df = pd.read_csv('tweets.csv')


def count_entries(df, col_name):
    """Return a dictionary with counts of 
    occurrences as value for each key."""
    distinct_count = {}
    
    print('From Function: ' , col_name, '\n\n')
    
    
    for entry in df[col_name]:
        
        if entry in distinct_count.keys():
            distinct_count[entry] += 1
        else:
            distinct_count[entry] = 1
            
    return distinct_count

#result = count_entries(tweets_df, 'lang')


# print('Result for Lang col is: ', result, '\n\n\n')


for column in  tweets_df.columns:
    #print(col)
    #print(tweets_df[col].dtype)
    
    if tweets_df[column].dtype == tweets_df['lang'].dtype and column not in ( 'text', 'extended_entities', 'user', 'source' , 'entities', 'quoted_status' , 'retweeted_status'):
        
        print ('\n\n\nThis result for ',  column , '\n', count_entries(tweets_df,column)) 
        # print(column)
    else:
        pass
        
        
#tweets_df.info()

From Function:  created_at 





This result for  created_at 
 {'Tue Mar 29 23:40:17 +0000 2016': 13, 'Tue Mar 29 23:40:18 +0000 2016': 53, 'Tue Mar 29 23:40:19 +0000 2016': 34}
From Function:  filter_level 





This result for  filter_level 
 {'low': 100}
From Function:  in_reply_to_screen_name 





This result for  in_reply_to_screen_name 
 {nan: 89, 'footlooseracer': 1, 'jbrading': 2, 'realDonaldTrump': 4, 'KyleTaylorLucas': 2, 'noreallyhowcome': 1, 'marklevinshow': 1}
From Function:  lang 





This result for  lang 
 {'en': 97, 'et': 1, 'und': 2}
From Function:  place 





This result for  place 
 {nan: 97, "{'id': '736dc4af8e68929c', 'country': 'United States', 'bounding_box': {'type': 'Polygon', 'coordinates': [[[-95.904492, 35.907134], [-95.904492, 36.017384], [-95.851283, 36.017384], [-95.851283, 35.907134]]]}, 'full_name': 'Bixby, OK', 'name': 'Bixby', 'place_type': 'city', 'url': 'https://api.twitter.com/1.1/geo/id/736dc4af8e68929c.json', 'country_code': 'US', 'attributes