# Playing around with Facebook data

I downloaded my Facebook data and have been doing random things with it for a while.

## Extracting messages

The first thing I did was to work on my messages, as these comprise most of my data. The Facebook data archive shows messages in HTML, so I had to parse the HTML files. Messages are archived in descending chronological order (latest messages first), and each thread (user or group chat) is named arbitrarily (not using usernames, group names, or chat IDs).

The following snippet extracts messages from a single HTML file and creates a CSV file. It should be easily extendable to do the same on all of your chats.

In [None]:
import os
import pandas
from bs4 import BeautifulSoup

filename = 'chat.html'

with open(filename) as f:
    soup = BeautifulSoup(f.read(), 'html.parser')
    message_headers = soup.find_all('div', class_='message')
    message_headers.reverse()
    messages = [
        {
            'user': message_header.find('span', class_='user').text,
            'date': message_header.find('span', class_='meta').text,
            'message': message_header.next_sibling.text,
        }
        for message_header in message_headers]
    df = pandas.DataFrame.from_dict(messages)
    df.to_csv('messages.csv', encoding='utf-8', index=False)