# Analyzing your ego mail network
> This small project is to show what information can be learned from analyzing your own mail.
> I will show you what you what I obtained analyzing my own mailbox.
> All the code was written within one hour.
> Investing more time should give more insights :)

Done at this moment:
+ Importing recipients and sender of each mail which I stored
+ Draw the relations (the edges) between who sents e-mail to who.

Not done at this moment:
- Did not do any text analysis
- Neither sentiment analysis (altough I tend to store only positive emails)
- Did not do anything with the time it was sent
- Did not apply any centrality measures (yet)
- Did not deal with idlab@ilt.nl properly.

# Step 1

Export all your emails from Outlook to msg-files.
1. Select all your emails in Outlook
2. Drag them from Outlook to an empty folder
3. Don't forget your sent items
4. Done

# Step 2

Import them in Python with a package called extract_msg (installable via pip):

In [None]:
import os; import extract_msg;
d = [];
for file in os.scandir('/data/bruingjde/data/outlook/archief'): # Iteration over every msg-file
    try:
        d.append(extract_msg.Message(file.path))
    except:
        pass

## Get recipients

In [None]:
receivers = [];
for index, mail in enumerate(d): # Iterate over every mail and append receiver to list.
    for split1 in mail.to.split("<"):
        for split2 in split1.split(">"):
            if "@" in split2:
                receivers.append((index, split2))

In [None]:
import pandas as pd
receivers = pd.DataFrame(receivers, columns=["mail_id","receiver"])
receivers.to_csv("/data/bruingjde/data/outlook/receivers.csv", index=False)
receivers

## Get senders

In [None]:
senders = [];
for mail in d:  # Iterate over every mail and append sender to list.
    for split1 in mail.sender.split("<"):
        for split2 in split1.split(">"):
            if "@" in split2:
                senders.append(split2)

In [None]:
senders = pd.Series(senders, name="senders")
#senders.to_pickle("/data/bruingjde/data/outlook/senders.pkl")
senders.to_csv("/data/bruingjde/data/outlook/senders.csv", header=False, index=False)

# Step 3
## Concept
We are now projecting the data to a network. A node will be an unique mailadress. An edge is only drawn between two individuals when a mail was sent from node A to node B (where I was also part of). Hence this is an **ego-network** from me. I will explain this with an example.

Imagine an **ego-network** around **me**. So we did mine the mailbox of **me**.
Now **me** receives a mail from person **A**.

In [None]:
from pyvis.network import Network
net = Network(notebook=True, height="300px", width="300px");
net.add_node(0, label="me")
net.add_node(1, label="A")
net.add_edge(0, 1)
net.prep_notebook()
net.show("test.html")

Now **B** sends an e-mail to me. We obtain the following network.

In [None]:
net.add_node(2, label="B")
net.add_edge(0, 2)
net.show("test.html")

Now **C** sends an e-mail to me with **A** and **B** in the CC. We obtain the following network.

In [None]:
net.add_node(3, label="C")
net.add_edge(0, 3); net.add_edge(1, 3); net.add_edge(2, 3);
net.show("test.html")

Ok. Let's now focus on a *real* example. Let's take my mailbox. Note that I keep only very limited number of e-mails for various reasons. First import the network in NetworkX.

In [None]:
senders = pd.read_csv("/data/bruingjde/data/outlook/senders.csv", squeeze=True)
receivers = pd.read_csv("/data/bruingjde/data/outlook/receivers.csv")

In [None]:
import networkx as nx
G = nx.MultiGraph();
for mail, sender in enumerate(senders):
    for _, receiver in receivers[receivers["mail_id"] == mail].iterrows():
        G.add_edge(sender, receiver["receiver"])

And now add the nodes into a *PyVis* object.

In [None]:
from pyvis.network import Network
net = Network(notebook=True, height="1000px", width="1000px");
net.prep_notebook()
from collections import Counter;
t1 = {}
for idx, node in enumerate(G.nodes()):
    net.add_node(idx, label=node)
    t1[node] = idx
t = Counter();
for edge in G.edges():
    t.update((edge, ""))
del t[""]
for key, value in t.items():
    net.add_edge(t1[key[0]], t1[key[1]], value=value)
net.show_buttons(filter_=['physics'])
net.show("test.html")