# Files

## Read a File

Read the content of the file `files/poem.txt` and print it.

In [2]:
with open ("files/poem.txt", "r") as file:
    cont = file.read()
print(cont)

The road not taken
By Robert Frost

Two roads diverged in a yellow wood,
And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could
To where it bent in the undergrowth;



## Write to a File

Write a note to the `files/note.txt` file that says, "Remember to buy milk and eggs."

In [7]:
with open ("files/poem.txt", "w") as file:
    file.write("Remember to buy milk and eggs.\n")

with open ("files/poem.txt", "r") as file:
    cont = file.read()
print(cont)

Remember to buy milk and eggs.



## Append to a File

Append a new task to the previous `note.txt` file in a new line: "And Beer!"

In [8]:
with open ("files/poem.txt", "a") as file:
    file.write("And Beer!\n")

with open ("files/poem.txt", "r") as file:
    cont = file.read()
print(cont)


Remember to buy milk and eggs.
And Beer!



## Counting Lines

Count the number of times "apple" appears in the `files/fruits.txt` file and print the count (it should be 5).

In [14]:
with open ("files/fruits.txt", "r") as f:
    cont = f.read()
#print(cont)
n = cont.count("apple")
print(n)

5


## Copy

Copy the content from `files/source.txt` to the new file `files/destination.txt`.

In [16]:
with open ("files/source.txt", "r") as f:
    cont = f.read()
print(cont)
with open ("files/destination.txt", "w") as f:
    f.write(cont)
with open ("files/destination.txt", "r") as f:
    cont = f.read()
print(cont)

This is the source file.
It has multiple lines.
Let's copy this content to another file.
This is the source file.
It has multiple lines.
Let's copy this content to another file.


## Working With CSV

Open the `files/grades.csv` file, calculate the average grade for each student, and print their names along with their average grades.

Solve this exercise in 3 different ways:
- Read the file in plain text (without using any CSV module).
- Use `csv` module.
- Use `pandas` module.

In [21]:
import csv
import pandas as pd

with open ("files/grades.csv", "r") as file:
    reader = csv.reader(file)
    for row in reader:
        print(row)

print("")

with open ("files/grades.csv", "r") as file:
    cont = file.read()
print(cont)

print("")

df = pd.read_csv("files/grades.csv")
print(df)


['Name', 'Math', 'Science', 'English']
['Alice', '90', '85', '88']
['Bob', '82', '95', '79']
['Charlie', '85', '80', '90']
['John', '64', '26', '91']
['Amy', '43', '65', '58']
['Hector', '76', '74', '79']
['Mark', '32', '41', '29']

Name,Math,Science,English
Alice,90,85,88
Bob,82,95,79
Charlie,85,80,90
John,64,26,91
Amy,43,65,58
Hector,76,74,79
Mark,32,41,29

      Name  Math  Science  English
0    Alice    90       85       88
1      Bob    82       95       79
2  Charlie    85       80       90
3     John    64       26       91
4      Amy    43       65       58
5   Hector    76       74       79
6     Mark    32       41       29


## Simple

Given a JSON file `files/simple.json`, add a new key-value pair "Language": "Python" and write the modified data back to the file.

In [22]:
import json

with open("files/simple.json", "r") as f:
    cont = json.load(f)
print(cont)

{'Name': 'John', 'Age': 25, 'City': 'New York', 'Language': 'Python'}


## Read and Display

Read data from the `files/info.yml` YAML file and display the content in the follwing order: `name`, `age`, and `hobbies`.

In [28]:
import yaml

with open("files/info.yml", "r") as f:
    cont = yaml.safe_load(f)
print(cont)

{'person': {'name': 'Alice', 'age': 30, 'hobbies': ['reading', 'hiking', 'swimming']}}


## Exception Handling

Try to open `files/missing.txt`. If there's an error related to the file not existing, print "The file does not exist." and set the contents of the file to `None`.

In [30]:
try:
    f = open("files/missing.txt")
    cont = file.read()
    print(cont)
except Exception as e:
    print("The file does not exist.")
    cont = None
finally:
    file.close()

The file does not exist.


## Word Frequency

Calculate the frequency of each word in the `files/bigtext.txt` file and determine the top 10 most frequent words.

In [35]:
with open("files/bigtext.txt", "r") as f:
    cont = f.read()
print(cont)
l = list(cont.split(" "))
d = {}
for wo in l:
    d[wo] = d.get(wo, 0) + 1
d_or = dict(sorted(d.items(), key=lambda item: item[1], reverse=True)[:10])
print(d_or)

Feel free to use an excerpt from a book, article, or any long text. For the purpose of this exercise, assume it's a long passage with multiple words.
{'a': 2, 'long': 2, 'Feel': 1, 'free': 1, 'to': 1, 'use': 1, 'an': 1, 'excerpt': 1, 'from': 1, 'book,': 1}


## Nested Parsing

List the IDs of users who have more than 5 incomplete tasks in the `files/todos.json` file.

In [16]:
import pandas as pd

df = pd.read_json("files/todos.json")
inc_task = df[df['completed'] == False]
#print(inc_task)
inc_count = inc_task["userId"].value_counts()
inc_count2 = inc_task.groupby('userId').size()   # es lo mismo que la anterior pero de otra manera, pero este ya te lo ordena
#print(inc_count)
#print(inc_count2)
users_5_inc = inc_count[inc_count2 > 5]
#print(users_5_inc)
users_id = users_5_inc.index.tolist()
print(users_id)
user_info = [[user_id, task_count] for user_id, task_count in zip(users_5_inc.index.tolist(), users_5_inc.values.tolist())]  # los une eun una lista los user_id y el conteo de False
print(user_info)



[4, 6, 3, 2, 9, 7, 1, 8, 5, 10]
[[4, 14], [6, 14], [3, 13], [2, 12], [9, 12], [7, 11], [1, 9], [8, 9], [5, 8], [10, 8]]


## Log Analyzer

Given a log file `files/logs.txt` with entries of the form `[Timestamp] [LOG LEVEL] Message`, extract and count the number of occurrences of each log level (e.g., INFO, WARN, ERROR).

In [49]:
with open("files/logs.txt", "r") as f:
    cont = f.read()
#print(cont)

words = list(cont.split(" "))
#print(words)
count_inf = words.count("[INFO]")
count_war = words.count("[WARN]")
count_err = words.count("[ERROR]")

print(f"[INFO] = {count_inf}, [WARN] = {count_war}, [ERROR] = {count_err}")

[INFO] = 12, [WARN] = 4, [ERROR] = 6


## Posts

In the following exercises use the files:
- `files/users.json`
- `files/posts.json`
- `files/comments.json`

In [2]:
import pandas as pd

df_use = pd.read_json("files/users.json")
#print(df_use)
df_pos = pd.read_json("files/posts.json")
#print(df_pos)
df_com = pd.read_json("files/comments.json")
#print(df_com)
print(df_use.columns)
print(df_pos.columns)
print(df_com.columns)

Index(['id', 'name', 'username', 'email', 'address', 'phone', 'website',
       'company'],
      dtype='object')
Index(['userId', 'id', 'title', 'body'], dtype='object')
Index(['postId', 'id', 'name', 'email', 'body'], dtype='object')


### Compare Data

Find and print the `postId` values that are present in both the posts and the comments in `files/posts.json` and `files/comments.json` files.

In [72]:
df_pos = df_pos.rename(columns = {"id":"postId"})
df_merg = pd.merge(df_pos, df_com, on = "postId")
post_id = df_merg["postId"].unique()
print(post_id)

[  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
  91  92  93  94  95  96  97  98  99 100]


### User Activity Logs

- Read `files/users.json` and write each user's name and email to a new `files/users/users_names.txt` file.
- Read `files/posts.json` and segregate them by users. Write each user's posts to a separate file named after them: `files/users/{user_id}_{user_name}.txt`.

In [106]:
import os

#print(df_use[["username","email"]])
with open("files/users/users_names.txt", "a") as f:
    for index, row in df_use.iterrows():
        f.write(f"username: {row['username']}, email: {row['email']}\n")

with open("files/users/users_names.txt", "r") as f:
    cont = f.read()
print(cont)

username: Bret, email: Sincere@april.biz
username: Antonette, email: Shanna@melissa.tv
username: Samantha, email: Nathan@yesenia.net
username: Karianne, email: Julianne.OConner@kory.org
username: Kamren, email: Lucio_Hettinger@annie.ca
username: Leopoldo_Corkery, email: Karley_Dach@jasper.info
username: Elwyn.Skiles, email: Telly.Hoeger@billy.biz
username: Maxime_Nienow, email: Sherwood@rosamond.me
username: Delphine, email: Chaim_McDermott@dana.io
username: Moriah.Stanton, email: Rey.Padberg@karina.biz
username: AliceJ, email: alice@example.com
username: BobS, email: bob@example.com
username: CharlieG, email: charlie@example.net
username: DianaP, email: diana@example.org
username: EllaF, email: ella@example.com
username: FrankS, email: frank@example.com
username: GraceW, email: grace@example.net
username: HankM, email: hank@example.org



In [111]:
df_merg = pd.merge(df_pos, df_use, left_on = "userId", right_on = "id")
#print(df_merg.columns)

for a, col in df_merg.iterrows():
    id = col["userId"]
    name = col["name"]
    us_path = os.path.join("files/users", f"{id}_{name}.txt")
    with open(us_path, 'a') as f:
        f.write(col['body'] + '\n')

### Post Length Analysis

Analyze the posts and create a file, `post_length_analysis.txt`, that categorizes posts based on their length: "Short" (0-50 chars), "Medium" (51-200 chars), and "Long" (>200 chars). List the number of posts in each category.

In [125]:
#print(df_pos["body"].str.len())
short = 0
med = 0
lon = 0
for i in df_pos["body"].str.len():
    if i <= 50:
        short += 1
    elif i <=200:
        med += 1
    else:
        lon += 1
print(short, med, lon)
with open("files/users/post_length_analysis.txt", "a") as f:
    f.write(f"short = {short}\nmedium = {med}\nlong = {lon}")

0 96 9


### Email Domain Counter

Analyze the email addresses of all users and list the frequency of each email domain (e.g., @example.com) in a file called `email_domains.txt`.

In [148]:
#print(df_use["email"].values)
d = {}
for val in df_use["email"].values:
    end = val[val.index("@") + 1:]
    d[end] = d.get(end, 0) + 1

with open("files/users/email_domains.txt", "a") as f:
    for key, value  in d.items():
        f.write(f"{key}: {value}\n")


{'april.biz': 1, 'melissa.tv': 1, 'yesenia.net': 1, 'kory.org': 1, 'annie.ca': 1, 'jasper.info': 1, 'billy.biz': 1, 'rosamond.me': 1, 'dana.io': 1, 'karina.biz': 1, 'example.com': 4, 'example.net': 2, 'example.org': 2}


### Most Engaged Users

Identify the top 3 users who've made the most posts and received the highest number of comments on their posts. Generate a `most_engaged_users.json` that contains the user details, total posts, and total comments received.

In [160]:
df_pos

Unnamed: 0,userId,id,title,body
0,1,1,sunt aut facere repellat provident occaecati e...,quia et suscipit\nsuscipit recusandae consequu...
1,1,2,qui est esse,est rerum tempore vitae\nsequi sint nihil repr...
2,1,3,ea molestias quasi exercitationem repellat qui...,et iusto sed quo iure\nvoluptatem occaecati om...
3,1,4,eum et est occaecati,ullam et saepe reiciendis voluptatem adipisci\...
4,1,5,nesciunt quas odio,repudiandae veniam quaerat sunt sed\nalias aut...
...,...,...,...,...
100,1,101,Exploring Wonderland,I had a great time exploring Wonderland. It wa...
101,2,102,Life in Bobsville,"Bobsville is amazing. AliceJ and CharlieG, hav..."
102,3,103,Charlie's adventures,I traveled the world and found so many interes...
103,4,104,Warrior life in Themyscira,Being a warrior in Themyscira is challenging b...


In [163]:
df_com

Unnamed: 0,postId,id,name,email,body
0,1,1,id labore ex et quam laborum,Eliseo@gardner.biz,laudantium enim quasi est quidem magnam volupt...
1,1,2,quo vero reiciendis velit similique earum,Jayne_Kuhic@sydney.com,est natus enim nihil est dolore omnis voluptat...
2,1,3,odio adipisci rerum aut animi,Nikita@garfield.biz,quia molestiae reprehenderit quasi aspernatur\...
3,1,4,alias odio sit,Lew@alysha.tv,non et atque\noccaecati deserunt quas accusant...
4,1,5,vero eaque aliquid doloribus et culpa,Hayden@althea.biz,harum non quasi et ratione\ntempore iure ex vo...
...,...,...,...,...,...
495,100,496,et occaecati asperiores quas voluptas ipsam no...,Zola@lizzie.com,neque unde voluptatem iure\nodio excepturi ips...
496,100,497,doloribus dolores ut dolores occaecati,Dolly@mandy.co.uk,non dolor consequatur\nlaboriosam ut deserunt ...
497,100,498,dolores minus aut libero,Davion@eldora.net,aliquam pariatur suscipit fugiat eos sunt\nopt...
498,100,499,excepturi sunt cum a et rerum quo voluptatibus...,Wilburn_Labadie@araceli.name,et necessitatibus tempora ipsum quaerat invent...


In [13]:
df_use

Unnamed: 0,id,name,username,email,address,phone,website,company
0,1,Leanne Graham,Bret,Sincere@april.biz,"{'street': 'Kulas Light', 'suite': 'Apt. 556',...",1-770-736-8031 x56442,hildegard.org,"{'name': 'Romaguera-Crona', 'catchPhrase': 'Mu..."
1,2,Ervin Howell,Antonette,Shanna@melissa.tv,"{'street': 'Victor Plains', 'suite': 'Suite 87...",010-692-6593 x09125,anastasia.net,"{'name': 'Deckow-Crist', 'catchPhrase': 'Proac..."
2,3,Clementine Bauch,Samantha,Nathan@yesenia.net,"{'street': 'Douglas Extension', 'suite': 'Suit...",1-463-123-4447,ramiro.info,"{'name': 'Romaguera-Jacobson', 'catchPhrase': ..."
3,4,Patricia Lebsack,Karianne,Julianne.OConner@kory.org,"{'street': 'Hoeger Mall', 'suite': 'Apt. 692',...",493-170-9623 x156,kale.biz,"{'name': 'Robel-Corkery', 'catchPhrase': 'Mult..."
4,5,Chelsey Dietrich,Kamren,Lucio_Hettinger@annie.ca,"{'street': 'Skiles Walks', 'suite': 'Suite 351...",(254)954-1289,demarco.info,"{'name': 'Keebler LLC', 'catchPhrase': 'User-c..."
5,6,Mrs. Dennis Schulist,Leopoldo_Corkery,Karley_Dach@jasper.info,"{'street': 'Norberto Crossing', 'suite': 'Apt....",1-477-935-8478 x6430,ola.org,"{'name': 'Considine-Lockman', 'catchPhrase': '..."
6,7,Kurtis Weissnat,Elwyn.Skiles,Telly.Hoeger@billy.biz,"{'street': 'Rex Trail', 'suite': 'Suite 280', ...",210.067.6132,elvis.io,"{'name': 'Johns Group', 'catchPhrase': 'Config..."
7,8,Nicholas Runolfsdottir V,Maxime_Nienow,Sherwood@rosamond.me,"{'street': 'Ellsworth Summit', 'suite': 'Suite...",586.493.6943 x140,jacynthe.com,"{'name': 'Abernathy Group', 'catchPhrase': 'Im..."
8,9,Glenna Reichert,Delphine,Chaim_McDermott@dana.io,"{'street': 'Dayna Park', 'suite': 'Suite 449',...",(775)976-6794 x41206,conrad.com,"{'name': 'Yost and Sons', 'catchPhrase': 'Swit..."
9,10,Clementina DuBuque,Moriah.Stanton,Rey.Padberg@karina.biz,"{'street': 'Kattie Turnpike', 'suite': 'Suite ...",024-648-3804,ambrose.net,"{'name': 'Hoeger LLC', 'catchPhrase': 'Central..."


In [166]:
df_pos['userId'] == uid

0      False
1      False
2      False
3      False
4      False
       ...  
100    False
101    False
102    False
103    False
104    False
Name: userId, Length: 105, dtype: bool

In [30]:
import json

cont = 0
l = []


for uid in df_use['id']:
    n_posts = sum(df_pos['userId'] == uid)
    #print(uid, n_posts)
    tot = 0
    for id_com in df_pos["id"]:
        n_com = sum(df_com["postId"] == id_com)
        tot += n_com
        #print(uid, n_posts, id_com, n_com, tot)
    name = df_use[df_use["id"] == uid]["name"].values[0]
    if (cont < 3):
        d = {}
        d["User"] = name
        d["Posts"] = n_posts
        d["Comments"] = tot
        l.append(d)
    cont += 1
    print(name, n_posts, tot)

with open("files/users/most_engaged_users1.json", "w") as f:
    json.dump(l, f)



#print(l)


Leanne Graham 12 500
Ervin Howell 11 500
Clementine Bauch 11 500
Patricia Lebsack 11 500
Chelsey Dietrich 10 500
Mrs. Dennis Schulist 10 500
Kurtis Weissnat 10 500
Nicholas Runolfsdottir V 10 500
Glenna Reichert 10 500
Clementina DuBuque 10 500
Alice Johnson 0 500
Bob Smith 0 500
Charlie Green 0 500
Diana Prince 0 500
Ella Fitzgerald 0 500
Franklin Stone 0 500
Grace Wall 0 500
Hank McCoy 0 500


In [18]:
!cat files/users/most_engaged_users.json

"cat" no se reconoce como un comando interno o externo,
programa o archivo por lotes ejecutable.


In [158]:
#print(df_use.columns)      este esta mal, buen intento
#print(df_pos.columns)
#print(df_com.columns)
df_merg = pd.merge(df_pos, df_use, left_on = "userId", right_on = "id")
#print(df_merg.columns)
df_tot_merg = pd.merge(df_merg, df_com, left_on= "id_x", right_on="postId")
print(df_tot_merg.columns)
for a, col in df_tot_merg.iterrows():
    name = col["name_x"]
    #post = col[""]



Index(['userId', 'id_x', 'title', 'body_x', 'id_y', 'name_x', 'username',
       'email_x', 'address', 'phone', 'website', 'company', 'postId', 'id',
       'name_y', 'email_y', 'body_y'],
      dtype='object')


In [159]:
df_tot_merg

Unnamed: 0,userId,id_x,title,body_x,id_y,name_x,username,email_x,address,phone,website,company,postId,id,name_y,email_y,body_y
0,1,1,sunt aut facere repellat provident occaecati e...,quia et suscipit\nsuscipit recusandae consequu...,1,Leanne Graham,Bret,Sincere@april.biz,"{'street': 'Kulas Light', 'suite': 'Apt. 556',...",1-770-736-8031 x56442,hildegard.org,"{'name': 'Romaguera-Crona', 'catchPhrase': 'Mu...",1,1,id labore ex et quam laborum,Eliseo@gardner.biz,laudantium enim quasi est quidem magnam volupt...
1,1,1,sunt aut facere repellat provident occaecati e...,quia et suscipit\nsuscipit recusandae consequu...,1,Leanne Graham,Bret,Sincere@april.biz,"{'street': 'Kulas Light', 'suite': 'Apt. 556',...",1-770-736-8031 x56442,hildegard.org,"{'name': 'Romaguera-Crona', 'catchPhrase': 'Mu...",1,2,quo vero reiciendis velit similique earum,Jayne_Kuhic@sydney.com,est natus enim nihil est dolore omnis voluptat...
2,1,1,sunt aut facere repellat provident occaecati e...,quia et suscipit\nsuscipit recusandae consequu...,1,Leanne Graham,Bret,Sincere@april.biz,"{'street': 'Kulas Light', 'suite': 'Apt. 556',...",1-770-736-8031 x56442,hildegard.org,"{'name': 'Romaguera-Crona', 'catchPhrase': 'Mu...",1,3,odio adipisci rerum aut animi,Nikita@garfield.biz,quia molestiae reprehenderit quasi aspernatur\...
3,1,1,sunt aut facere repellat provident occaecati e...,quia et suscipit\nsuscipit recusandae consequu...,1,Leanne Graham,Bret,Sincere@april.biz,"{'street': 'Kulas Light', 'suite': 'Apt. 556',...",1-770-736-8031 x56442,hildegard.org,"{'name': 'Romaguera-Crona', 'catchPhrase': 'Mu...",1,4,alias odio sit,Lew@alysha.tv,non et atque\noccaecati deserunt quas accusant...
4,1,1,sunt aut facere repellat provident occaecati e...,quia et suscipit\nsuscipit recusandae consequu...,1,Leanne Graham,Bret,Sincere@april.biz,"{'street': 'Kulas Light', 'suite': 'Apt. 556',...",1-770-736-8031 x56442,hildegard.org,"{'name': 'Romaguera-Crona', 'catchPhrase': 'Mu...",1,5,vero eaque aliquid doloribus et culpa,Hayden@althea.biz,harum non quasi et ratione\ntempore iure ex vo...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
495,10,100,at nam consequatur ea labore ea harum,cupiditate quo est a modi nesciunt soluta\nips...,10,Clementina DuBuque,Moriah.Stanton,Rey.Padberg@karina.biz,"{'street': 'Kattie Turnpike', 'suite': 'Suite ...",024-648-3804,ambrose.net,"{'name': 'Hoeger LLC', 'catchPhrase': 'Central...",100,496,et occaecati asperiores quas voluptas ipsam no...,Zola@lizzie.com,neque unde voluptatem iure\nodio excepturi ips...
496,10,100,at nam consequatur ea labore ea harum,cupiditate quo est a modi nesciunt soluta\nips...,10,Clementina DuBuque,Moriah.Stanton,Rey.Padberg@karina.biz,"{'street': 'Kattie Turnpike', 'suite': 'Suite ...",024-648-3804,ambrose.net,"{'name': 'Hoeger LLC', 'catchPhrase': 'Central...",100,497,doloribus dolores ut dolores occaecati,Dolly@mandy.co.uk,non dolor consequatur\nlaboriosam ut deserunt ...
497,10,100,at nam consequatur ea labore ea harum,cupiditate quo est a modi nesciunt soluta\nips...,10,Clementina DuBuque,Moriah.Stanton,Rey.Padberg@karina.biz,"{'street': 'Kattie Turnpike', 'suite': 'Suite ...",024-648-3804,ambrose.net,"{'name': 'Hoeger LLC', 'catchPhrase': 'Central...",100,498,dolores minus aut libero,Davion@eldora.net,aliquam pariatur suscipit fugiat eos sunt\nopt...
498,10,100,at nam consequatur ea labore ea harum,cupiditate quo est a modi nesciunt soluta\nips...,10,Clementina DuBuque,Moriah.Stanton,Rey.Padberg@karina.biz,"{'street': 'Kattie Turnpike', 'suite': 'Suite ...",024-648-3804,ambrose.net,"{'name': 'Hoeger LLC', 'catchPhrase': 'Central...",100,499,excepturi sunt cum a et rerum quo voluptatibus...,Wilburn_Labadie@araceli.name,et necessitatibus tempora ipsum quaerat invent...


### Geo-Post Analysis

Analyze where posts are coming from geographically. For each city where users reside, calculate the average number of posts made. Generate `city_post_avg.json` containing each city and its average post count.

In [77]:
#print(df_use["address"])


cont = 0
d = {}
l = []


for uid in df_use['id']:
    n_posts = sum(df_pos['userId'] == uid)
    #print(uid, n_posts)
    city = df_use[df_use["id"] == uid]["address"].values[0]["city"]    
    d[city] = n_posts
    l.append(d)
    tot += n_posts
    #print(city, n_posts, tot)

for i in l:
    for j, k in i.items():
        d[j] =  k /tot 

print(l)
#for address in df_use["address"]:
#    if "city" in address:
#        print(address["city"])

[{'Gwenborough': 7.903629209358315e-60, 'Wisokyburgh': 7.244993441911789e-60, 'McKenziehaven': 7.244993441911789e-60, 'South Elvis': 7.244993441911789e-60, 'Roscoeview': 6.586357674465263e-60, 'South Christy': 6.586357674465263e-60, 'Howemouth': 6.586357674465263e-60, 'Aliyaview': 6.586357674465263e-60, 'Bartholomebury': 6.586357674465263e-60, 'Lebsackbury': 6.586357674465263e-60, 'Wonderland': 0.0, 'Bobsville': 0.0, 'Charlieville': 0.0, 'Themyscira': 0.0, 'Jazztown': 0.0, 'Rockplace': 0.0, 'Gracetown': 0.0, 'Mutantville': 0.0}, {'Gwenborough': 7.903629209358315e-60, 'Wisokyburgh': 7.244993441911789e-60, 'McKenziehaven': 7.244993441911789e-60, 'South Elvis': 7.244993441911789e-60, 'Roscoeview': 6.586357674465263e-60, 'South Christy': 6.586357674465263e-60, 'Howemouth': 6.586357674465263e-60, 'Aliyaview': 6.586357674465263e-60, 'Bartholomebury': 6.586357674465263e-60, 'Lebsackbury': 6.586357674465263e-60, 'Wonderland': 0.0, 'Bobsville': 0.0, 'Charlieville': 0.0, 'Themyscira': 0.0, 'Jazz

### Post Interconnectivity

Check if users mention other users in their posts (based on usernames). Generate a `user_mentions.json` file that lists for each user which other users they've mentioned the most across all their posts.

# Bonus

## File Comparison Tool

Create a tool that takes in the paths of two text files and outputs the lines that differ between them.

##  File Metadata Extractor

Create a Python program that navigates through every file and folder in a given directory (recursively), extracts metadata from each file, and then saves this metadata to a CSV file. The metadata to capture for each file: File Name, File Path, File Size, Last Modified Date.

## Stack Overflow Survey Analysis

Unzip the datasets contained in `files/stackoverflow/`.

These datasets contain responses to the annual Stack Overflow survey, including various aspects like the type of developer, education, job satisfaction, and more.

**Objective**:
Analyze the Stack Overflow Developer Survey data to gain insights into the development community and its trends. This will involve integrating the yearly datasets, extracting relevant insights, visualizing the results, and creating a comprehensive PDF report.

**Tasks**:


1. File Discovery & Data Exploration:
    - Download and unzip the dataset folder.
    - Programmatically discover and read the yearly survey files.
    - Familiarize yourself with the structure and columns in the datasets.

2. Data Integration & Cleaning:
    - Handle missing values and inconsistencies between yearly datasets.
    - Integrate data for the past 5 years into one consolidated dataset, taking care of column differences and inconsistencies.

3. Analysis:
    - Identify the top 5 most popular programming languages for the past 5 years.
    - Calculate the median salary for developers in the US, Europe, and Asia for each year.
    - Determine the percentage of remote workers over the years.

4. Visualizations:
    - Plot the trends of the top 5 programming languages over the past 5 years.
    - Create a bar chart comparing the median salaries in the US, Europe, and Asia for each year.
    - Generate a pie chart showing the distribution of developers based on their highest level of formal education.

5. Generate PDF Report:
    - Create a comprehensive report detailing the above findings.
    - Include the charts generated in the visualizations step.
    - Save the report as a StackOverflow_Survey_Analysis.pdf.