# Python Data Science Toolbox (Part 1)

## Default arguments, variable-length arguments and scope

#### Bringing it all together (1)

Recall the *Bringing it all together* exercise in the previous chapter where you did a simple Twitter analysis by developing a functions that counts how many tweets are in certain languages. The output of your function was a dictionary that had the language as the keys and the counts of tweets in that language as the value.
In this exercise,  we will generalize the Twitter language analysis that you did in the previous chapter. You will do that by including a **default argument** that takes a column name.

For your convenience, `pandas` has been imported as `pd` and the `tweets.csv` file has been  imported into the DataFrame `tweets_df`. Parts of the code from your previous work are also provided.

#### Instructions
* Complete the function header by supplying the parameter for a DataFrame `df` and the parameter `col_name` with a default value of `lang` for the DataFrame column name.
* Call `count_entries()` by passing the `tweets_df` DataFrame and the column name 'lang'. Assign the result to `result1`. Note that since `lang` is the default value of the `col_name` parameter, you don't have to specify it here.
* Call `count_entries()` by passing the `tweets_df` DataFrame and the column name `'source'`. Assign the result to `result2`.

In [2]:
import pandas as pd
tweets_df = pd.read_csv("../datasets/tweets.csv")
tweets_df.head()

Unnamed: 0,contributors,coordinates,created_at,entities,extended_entities,favorite_count,favorited,filter_level,geo,id,...,quoted_status_id,quoted_status_id_str,retweet_count,retweeted,retweeted_status,source,text,timestamp_ms,truncated,user
0,,,Tue Mar 29 23:40:17 +0000 2016,"{'hashtags': [], 'user_mentions': [{'screen_na...","{'media': [{'sizes': {'large': {'w': 1024, 'h'...",0,False,low,,714960401759387648,...,,,0,False,"{'retweeted': False, 'text': "".@krollbondratin...","<a href=""http://twitter.com"" rel=""nofollow"">Tw...",RT @bpolitics: .@krollbondrating's Christopher...,1459294817758,False,"{'utc_offset': 3600, 'profile_image_url_https'..."
1,,,Tue Mar 29 23:40:17 +0000 2016,"{'hashtags': [{'text': 'cruzsexscandal', 'indi...","{'media': [{'sizes': {'large': {'w': 500, 'h':...",0,False,low,,714960401977319424,...,,,0,False,"{'retweeted': False, 'text': '@dmartosko Cruz ...","<a href=""http://twitter.com"" rel=""nofollow"">Tw...",RT @HeidiAlpine: @dmartosko Cruz video found.....,1459294817810,False,"{'utc_offset': None, 'profile_image_url_https'..."
2,,,Tue Mar 29 23:40:17 +0000 2016,"{'hashtags': [], 'user_mentions': [], 'symbols...",,0,False,low,,714960402426236928,...,,,0,False,,"<a href=""http://www.facebook.com/twitter"" rel=...",Njihuni me ZonjÃ«n Trump !!! | Ekskluzive http...,1459294817917,False,"{'utc_offset': 7200, 'profile_image_url_https'..."
3,,,Tue Mar 29 23:40:17 +0000 2016,"{'hashtags': [], 'user_mentions': [], 'symbols...",,0,False,low,,714960402367561730,...,7.149239e+17,7.149239e+17,0,False,,"<a href=""http://twitter.com/download/android"" ...",Your an idiot she shouldn't have tried to grab...,1459294817903,False,"{'utc_offset': None, 'profile_image_url_https'..."
4,,,Tue Mar 29 23:40:17 +0000 2016,"{'hashtags': [], 'user_mentions': [{'screen_na...",,0,False,low,,714960402149416960,...,,,0,False,"{'retweeted': False, 'text': 'The anti-America...","<a href=""http://twitter.com/download/iphone"" r...",RT @AlanLohner: The anti-American D.C. elites ...,1459294817851,False,"{'utc_offset': -18000, 'profile_image_url_http..."


In [4]:
# Define count_entries()
def count_entries(df, col_name="lang"):
    """Return a dictionary with counts of occurrences as value for each key."""

    # Initialize an empty dictionary: cols_count
    cols_count = {}

    # Extract column from DataFrame: col
    col = df[col_name]

    # Iterate over the column in DataFrame
    for entry in col:

        # If entry is in cols_count, add 1
        if entry in cols_count.keys():
            cols_count[entry] += 1
        # Else add the entry to cols_count, set the value to 1
        else:
            cols_count[entry] = 1

    # Return the cols_count dictionary
    return cols_count

# Call count_entries(): result1
result1 = count_entries(tweets_df)

# Call count_entries(): result2
result2 = count_entries(tweets_df, 'source')

# Print result1 and result2
display(result1)
display(result2)

{'en': 97, 'et': 1, 'und': 2}

{'<a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>': 24,
 '<a href="http://www.facebook.com/twitter" rel="nofollow">Facebook</a>': 1,
 '<a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>': 26,
 '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>': 33,
 '<a href="http://www.twitter.com" rel="nofollow">Twitter for BlackBerry</a>': 2,
 '<a href="http://www.google.com/" rel="nofollow">Google</a>': 2,
 '<a href="http://twitter.com/#!/download/ipad" rel="nofollow">Twitter for iPad</a>': 6,
 '<a href="http://linkis.com" rel="nofollow">Linkis.com</a>': 2,
 '<a href="http://rutracker.org/forum/viewforum.php?f=93" rel="nofollow">newzlasz</a>': 2,
 '<a href="http://ifttt.com" rel="nofollow">IFTTT</a>': 1,
 '<a href="http://www.myplume.com/" rel="nofollow">PlumeÂ forÂ Android</a>': 1}