# Facebook Data Miner Command-Line Interface

This notebook has basically two main purposes:
1. show-off the CLI by examples,
2. serve as a reference for testing the CLI.

The application has a well-defined CLI. The combinations of all the functions we wanted to expose was far too much to use [Click](https://click.palletsprojects.com/en/7.x/) and its deorators. Instead we went for the simple-stupid [Python Fire](https://google.github.io/python-fire/guide/). I call it simple-stupid, because beside the functions that we wanted to expose and some face classes the only thing we had to do is to pass the application's main entrypoint to Python Fire like this:
```
app = App(DATA_PATH)c
Fire(app, name='Facebook-Data-Miner')
```

Be aware, that every cell starts a new CLI instance, thus, with your own Facebook data (not the test data) this notebook may take *up to an hour or more* to run. Until a database is used for this app to store your data, CLI calls, may be painfully slow.

**However** `Python Fire` has a feature to run a CLI app in interctive mode. To use it, refer to the [docs](https://google.github.io/python-fire/using-cli/#python-fires-flags). But basically what you want to do is e.g. `./miner/cli.py analyzer -- --interactive`.

Happy coding!

Let's set up first the correct working directory. This is needed for the CLI  to work. In a shell environment you usually want to do something like the following in the root folder of this project:
```
export PYTHONPATH="$PWD"
```

In [1]:
import os
BASE_PATH = None

In [2]:
if not BASE_PATH:
    BASE_PATH = os.path.dirname(os.path.abspath(os.getcwd()))
BASE_PATH

'/home/levente/projects/facebook-data-miner'

In [3]:
try:
	os.chdir(BASE_PATH)
	print(f"OK! Changed to: {os.getcwd()} directory.")
except:
	print(f"WARNING! Couldn't change directory. Current is: {os.getcwd()}")

OK! Changed to: /home/levente/projects/facebook-data-miner directory.


Make cli file executable.

In [4]:
!chmod +x miner/cli.py # TODO what about windows

## Setting data path

In order to run this notebook, you need to specify the data path where your Facebook data is. Search for the file `configuration.yml` in the root of this project, and set the `DATA_PATH` variable either to the zip file's absolute path, or if you have already extracted it, the absolute path of the data directory.

## Features of the CLI

Now we can start with the CLI tool. Python Fire takes an object and lets the user to call all of the objects public method (in Python that means: methods without a starting underscore).

In this notebook we will cover the following groups of methods:
- friends,
- conversations,
- analyzer,
- people,
- report,
- plot.

This means you can get detailed information for these groups once you add these keywords after the file's name you are calling: that is `./miner/app.py`. Let's get the help for the main entrypoint (NOTE: Python Fire pipes the output of the `help` into an interactive file reader, thus the output will be cut after one screen).

In [5]:
!./miner/cli.py --help

INFO: Showing help with the command 'Facebook-Data-Miner -- --help'.

[1mNAME[0m
    Facebook-Data-Miner

[1mSYNOPSIS[0m
    Facebook-Data-Miner [4mCOMMAND[0m

[1mCOMMANDS[0m
    [1m[4mCOMMAND[0m[0m is one of the following:

     analyzer

     conversations
       @param kind: @param channels: @param cols: @param output: @return:

     friends
       @param sort: a @param dates: b @param output: c @return: list of friends, sorted by @sort, with dates of making friend if @dates is True, saved in a csv or json file if @output is a valid path.

     people

     plot

     report


### Friends

In [6]:
!./miner/cli.py friends --help

INFO: Showing help with the command 'Facebook-Data-Miner friends -- --help'.

[1mNAME[0m
    Facebook-Data-Miner friends - @param sort: a @param dates: b @param output: c @return: list of friends, sorted by @sort, with dates of making friend if @dates is True, saved in a csv or json file if @output is a valid path.

[1mSYNOPSIS[0m
    Facebook-Data-Miner friends <flags>

[1mDESCRIPTION[0m
    @param sort: a @param dates: b @param output: c @return: list of friends, sorted by @sort, with dates of making friend if @dates is True, saved in a csv or json file if @output is a valid path.

[1mFLAGS[0m
    --sort=[4mSORT[0m
    --dates=[4mDATES[0m
    --output=[4mOUTPUT[0m


As you can see friends does not have further executable commands you can follow it up with. The description should be quite clear about what these params/flags can do. Let's just first use it without flags.

In [7]:
!./miner/cli.py friends

timestamp,name
2020-02-06 15:25:00,Guy Fawkes
2020-02-06 15:26:40,Daisy Duck
2020-02-06 15:29:01,Bugs Bunny
2020-02-12 17:01:52,Dér Dénes
2020-02-21 14:07:59,Tőke Hal
2020-03-14 21:54:52,Foo Bar
2020-03-15 20:18:28,Szett Droxler
2020-04-09 21:42:05,Donald Duck
2020-05-28 15:41:59,John Doe



In [8]:
!./miner/cli.py friends --sort=name

timestamp,name
2020-02-06 15:29:01,Bugs Bunny
2020-02-06 15:26:40,Daisy Duck
2020-04-09 21:42:05,Donald Duck
2020-02-12 17:01:52,Dér Dénes
2020-03-14 21:54:52,Foo Bar
2020-02-06 15:25:00,Guy Fawkes
2020-05-28 15:41:59,John Doe
2020-03-15 20:18:28,Szett Droxler
2020-02-21 14:07:59,Tőke Hal



In [9]:
!./miner/cli.py friends --sort=name --dates=False

,name
0,Bugs Bunny
1,Daisy Duck
2,Donald Duck
3,Dér Dénes
4,Foo Bar
5,Guy Fawkes
6,John Doe
7,Szett Droxler
8,Tőke Hal



Notice that this ouput is formatted as a CSV. You can aslo format it as a `json`. Just pass `json` as the value of the `--output` flag.

In [10]:
!./miner/cli.py friends --sort=name --output=json

{"name":{"1581002941000":"Bugs Bunny","1581002800000":"Daisy Duck","1586468525000":"Donald Duck","1581526912000":"D\u00e9r D\u00e9nes","1584222892000":"Foo Bar","1581002700000":"Guy Fawkes","1590680519000":"John Doe","1584303508000":"Szett Droxler","1582294079000":"T\u0151ke Hal"}}


We can also write this to an `output` file instead of `stdout`.

In [11]:
!./miner/cli.py friends --sort=name --output=$PWD/notebooks/out.csv

Data was written to /home/levente/projects/facebook-data-miner/notebooks/out.csv


In [12]:
!cat $PWD/notebooks/out.csv

timestamp,name
2020-02-06 15:29:01,Bugs Bunny
2020-02-06 15:26:40,Daisy Duck
2020-04-09 21:42:05,Donald Duck
2020-02-12 17:01:52,Dér Dénes
2020-03-14 21:54:52,Foo Bar
2020-02-06 15:25:00,Guy Fawkes
2020-05-28 15:41:59,John Doe
2020-03-15 20:18:28,Szett Droxler
2020-02-21 14:07:59,Tőke Hal


Or you can write it to a json, just add a filename that ends with `.json`.

In [13]:
!./miner/cli.py friends --sort=name --output=$PWD/notebooks/out.json

Data was written to /home/levente/projects/facebook-data-miner/notebooks/out.json


In [14]:
!cat $PWD/notebooks/out.json

{"name":{"1581002941000":"Bugs Bunny","1581002800000":"Daisy Duck","1586468525000":"Donald Duck","1581526912000":"D\u00e9r D\u00e9nes","1584222892000":"Foo Bar","1581002700000":"Guy Fawkes","1590680519000":"John Doe","1584303508000":"Szett Droxler","1582294079000":"T\u0151ke Hal"}}

Let's clear up.

In [15]:
!rm $PWD/notebooks/out.csv
!rm $PWD/notebooks/out.json

### Conversations
Conversations is also an interface that points to a single function. So let's dive into it.

In [16]:
!./miner/cli.py conversations --help

INFO: Showing help with the command 'Facebook-Data-Miner conversations -- --help'.

[1mNAME[0m
    Facebook-Data-Miner conversations - @param kind: @param channels: @param cols: @param output: @return:

[1mSYNOPSIS[0m
    Facebook-Data-Miner conversations <flags>

[1mDESCRIPTION[0m
    @param kind: @param channels: @param cols: @param output: @return:

[1mFLAGS[0m
    --kind=[4mKIND[0m
    --channels=[4mCHANNELS[0m
    --cols=[4mCOLS[0m
    --output=[4mOUTPUT[0m


In [17]:
!./miner/cli.py conversations

timestamp_ms,sender_name,content,type,partner,videos,audio_files,photos,gifs,reactions,files
2014-09-24 17:02:08.715,Jenő Rejtő,are you the real Bugs Bunny?,Generic,Bugs Bunny,,,,,,
2014-11-09 19:56:46.047,Jenő Rejtő,older stuff,Generic,Tőke Hal,,,,,,
2014-11-09 20:13:26.047,Tőke Hal,testing multiple files,Generic,Tőke Hal,,,,,,
2014-11-09 23:13:48.715,Jenő Rejtő,yo,Generic,Tőke Hal,,,,,,
2014-11-09 23:13:48.715,Jenő Rejtő,are you the real Bugs Bunny?,Generic,Bugs Bunny,,,,,,
2014-11-10 12:20:06.047,Tőke Hal,yo,Generic,Tőke Hal,,,,,,
2014-11-10 12:21:46.047,Tőke Hal,zup,Generic,Tőke Hal,,,,,,
2014-11-10 12:21:46.047,Bugs Bunny,no,Generic,Bugs Bunny,,,,,,
2014-11-10 12:26:46.047,Bugs Bunny,no,Generic,Bugs Bunny,,,,,,
2014-11-10 12:30:45.145,Jenő Rejtő,not much,Generic,Tőke Hal,,,,,,
2014-11-22 02:17:25.145,Jenő Rejtő,,Generic,Bugs Bunny,,,[{'uri': 'messages/inbox/TeflonMusk_fSD454F/photos/index.jpeg'}],,,
2014-12-03 16:07:25.145,Jenő Rejtő,not,Generic,Tőke Hal,,,,,,
2014-12-26 20:01:46.

As you can see calling this node of the interfacer prints out all the data private conversation data we have. We don's see, but the parameter `kind` itnernally defaults to `private`. We can change this to `group`.

In [18]:
!./miner/cli.py conversations --kind=group

timestamp_ms,sender_name,content,type,partner,photos,gifs
2011-07-17 15:00:06.580,Donald Duck,test,Generic,"Tőke Hal, Foo Bar, Donald Duck and 2 others",,
2011-07-17 15:00:08.580,Jenő Rejtő,test,Generic,"Foo Bar, John Doe and Bugs Bunny",,
2011-07-17 15:00:13.721,Foo Bar,what do you test,Generic,"Tőke Hal, Foo Bar, Donald Duck and 2 others",,
2011-07-17 15:00:13.721,Foo Bar,what do you test,Generic,"Foo Bar, John Doe and Bugs Bunny",,
2011-07-17 15:00:32.011,Tőke Hal,basic group messages,Generic,"Tőke Hal, Foo Bar, Donald Duck and 2 others",,
2011-07-17 15:00:32.012,Dér Dénes,blabla,Generic,"Tőke Hal, Foo Bar, Donald Duck and 2 others",,
2011-07-17 15:00:32.012,Bugs Bunny,basic group messages,Generic,"Foo Bar, John Doe and Bugs Bunny",,
2011-07-17 15:02:54.237,John Doe,ok,Generic,"Foo Bar, John Doe and Bugs Bunny",,
2011-07-17 15:02:54.237,Facebook User,ok,Generic,"Tőke Hal, Foo Bar, Donald Duck and 2 others",,
2018-04-19 12:31:42.152,Jenő Rejtő,marathon?,Generic,marathon,,
2018-04-19 

You can also filter the conversations to some specific channels. Note that you can provide more values to one flag, but the separation of these values is really clunky. The separator is `;!;`. The reason for this is that group names can actually contain a number of characters. This combination makes it so, that the separation works 99.99% of the time. 

In [19]:
!./miner/cli.py conversations --channels="Foo Bar;!;Teflon Musk"

timestamp_ms,sender_name,content,type,videos,audio_files,photos,gifs,reactions,files,partner
2020-02-13 06:15:28.715,Jenő Rejtő,Lorem lorim.. foo bar 😡😡😡,Generic,,,,,,,Foo Bar
2020-02-13 06:15:38.715,Foo Bar,Ut akar ... consequat. oO wow :P xd :D,Generic,,,,,"[{'reaction': '❤', 'actor': 'Jenő Rejtő'}]",,Foo Bar
2020-02-14 01:42:08.145,Jenő Rejtő,,Generic,,,[{'uri': 'messages/inbox/FooBar_n5fd6gG50h/photos/blueberry-5417154_960_720.jpg'}],,,,Foo Bar
2020-02-14 04:28:48.047,Foo Bar,,Generic,,,,,,[{'uri': 'messages/inbox/FooBar_n5fd6gG50h/files/1810.04805.pdf'}],Foo Bar
2020-02-14 12:48:48.047,Jenő Rejtő,Duis duia .. ! xdddddd :D,Generic,,,,,,,Foo Bar
2020-02-14 15:35:28.047,Foo Bar,Excepteur...laborum. :D,Generic,,,,,,,Foo Bar
2020-02-14 18:22:08.145,Jenő Rejtő,,Generic,,,,[{'uri': 'messages/inbox/FooBar_n5fd6gG50h/gifs/97999627_1419172538270405_8596479473619042304_n_2963870430335255.gif'}],"[{'reaction': '😮', 'actor': 'Foo Bar'}]",,Foo Bar
2020-02-18 00:08:48.047,Jenő Rejtő,What the hac

You can also filter out some columns by providing column names that you want in the output.

In [20]:
!./miner/cli.py conversations --channels="Foo Bar" --cols='sender_name;!;type;!;content'

timestamp_ms,sender_name,type,content
2020-02-13 06:15:28.715,Jenő Rejtő,Generic,Lorem lorim.. foo bar 😡😡😡
2020-02-13 06:15:38.715,Foo Bar,Generic,Ut akar ... consequat. oO wow :P xd :D
2020-02-14 01:42:08.145,Jenő Rejtő,Generic,
2020-02-14 04:28:48.047,Foo Bar,Generic,
2020-02-14 12:48:48.047,Jenő Rejtő,Generic,Duis duia .. ! xdddddd :D
2020-02-14 15:35:28.047,Foo Bar,Generic,Excepteur...laborum. :D
2020-02-14 18:22:08.145,Jenő Rejtő,Generic,
2020-02-18 00:08:48.047,Jenő Rejtő,Generic,What the hack? xdddddd :D
2020-02-18 08:28:48.145,Foo Bar,Generic,
2020-02-26 13:42:08.145,Jenő Rejtő,Generic,
2020-03-09 11:48:48.047,Foo Bar,Generic,
2020-04-02 20:08:48.047,Jenő Rejtő,Generic,Whet? Check this! :P
2020-04-25 23:42:08.047,Jenő Rejtő,Generic,
2020-05-03 12:15:28.123,Foo Bar,Generic,OUT!
2020-08-08 20:22:08.321,Jenő Rejtő,Generic,OUT! ❤



As of the parameter `output`, it works the same as it works in the case of CLI node `friends`. There are 4 possibilities: format csv and json in file or on stdout.

### Messaging Analyzer
Now we got to the most complex node in this CLI. So let's start with the help message. As you will see this only prints out information on the analyzer function's signature. 

In [21]:
!./miner/cli.py analyzer --help

[1mNAME[0m
    Facebook-Data-Miner analyzer --help

[1mSYNOPSIS[0m
    Facebook-Data-Miner analyzer --help - [4mCOMMAND[0m

[1mCOMMANDS[0m
    [1m[4mCOMMAND[0m[0m is one of the following:

     get_who_i_have_private_convo_with_from_a_group

     how_much_i_speak_in_private_with_group_members

     is_priv_msg_first_then_group

     people_i_have_group_convo_with

     people_i_have_private_convo_with


The function can have two distinct outcome based on wether you provide any value to the `kind` parameter. 

**If not**, then you will get the rather high level MessagingAnalyzer object's methods, which has functionalities that analyze messages both in private and group channels.

**Otherwise**, if `kind` has the value `private` or `group`, you will get a facade object to the lower-level `{Private|Group}MessagingAnalyzer` and `{Private|Group}ConversationStats` objects. This facade has almost 50 methods, and a lot of them take parameters as well. We will cover all of the functions for the sake of completeness, and also because this notebook is a subject for tests. This will be a really long part of the notebook, but it contains most of the information. However if you want to skip to the [goodies](#report), like reports and plot, go ahead.

Let's go with the first story. No value provided for `kind`.

In [22]:
!./miner/cli.py analyzer

[1mNAME[0m
    Facebook-Data-Miner analyzer

[1mSYNOPSIS[0m
    Facebook-Data-Miner analyzer - [4mCOMMAND[0m

[1mCOMMANDS[0m
    [1m[4mCOMMAND[0m[0m is one of the following:

     get_who_i_have_private_convo_with_from_a_group

     how_much_i_speak_in_private_with_group_members

     is_priv_msg_first_then_group

     people_i_have_group_convo_with

     people_i_have_private_convo_with


Let's go over these methods. 

**IMPORTANT NOTE**: as you have seen the `analyzer` function has quite a lot input parameters, which of course can be empty. To tell `Python Fire` that you don't want to fill those values, you have to use a separator. The default is a dash (`-`), but you can change this to almost any character (see [this](https://google.github.io/python-fire/guide/#calling-functions) for reference).

In [23]:
!./miner/cli.py analyzer - people_i_have_group_convo_with --help # NOTE the single `-` after the analyzer

INFO: Showing help with the command 'Facebook-Data-Miner analyzer - people_i_have_group_convo_with -- --help'.

[1mNAME[0m
    Facebook-Data-Miner analyzer people_i_have_group_convo_with

[1mSYNOPSIS[0m
    Facebook-Data-Miner analyzer - people_i_have_group_convo_with -


Note also, that you can use `--help` with the functions as well. `Python Fire` will read the signature and the docstring of the function and create a help message from them.

In [24]:
!./miner/cli.py analyzer - people_i_have_private_convo_with

Tőke Hal
Foo Bar
Bugs Bunny
Benedek Elek


In [25]:
!./miner/cli.py analyzer - people_i_have_group_convo_with

Tőke Hal
Jenő Rejtő
Dér Dénes
Facebook User
Donald Duck
Foo Bar
Bugs Bunny
John Doe


In [26]:
!./miner/cli.py analyzer - get_who_i_have_private_convo_with_from_a_group --group_name=marathon

Bugs Bunny
Foo Bar


In [27]:
!./miner/cli.py analyzer - how_much_i_speak_in_private_with_group_members --group_name=marathon

Bugs Bunny: 6
Foo Bar:    15


In [28]:
!./miner/cli.py analyzer - is_priv_msg_first_then_group --name='Foo Bar'

True


#### Private and Group Messaging Analyzer
Both `private` and `group` is created with the very same class, but since the minor differences in the inner structure of two  channels there are some methods that make more sense for one and less for the other one. But we will cover this soon.

Now let's start with a help message.

In [29]:
!./miner/cli.py analyzer private - --help

INFO: Showing help with the command 'Facebook-Data-Miner analyzer private - -- --help'.

[1mNAME[0m
    Facebook-Data-Miner analyzer private

[1mSYNOPSIS[0m
    Facebook-Data-Miner analyzer private - [4mCOMMAND[0m

[1mCOMMANDS[0m
    [1m[4mCOMMAND[0m[0m is one of the following:

     all_channels
       @param name: a partner name. @return: all channels for this partner (private and groups).

     audios

     average_word_length

     cc

     channels

     contributors
       @return:

     created_by_me

     creator

     end

     files

     get_grouped_time_series_data

     gifs

     is_group

     max_channel_size

     mc

     mean_channel_size

     media

     media_mc

     message_language_map
       @return:

     message_language_ratio

     messages

     min_channel_size

     most_used_msgs

     most_used_words

     number_of_channels

     number_of_contributors

     number_of_convos_created_by_me

     participant_to_channel_map

     participants

And one for `group` as well, delete the `#` sign if you want to verify.

In [30]:
#!./miner/cli.py analyzer group --help

Now as you can see both man pages look the same. 

As described above the facade we get exposes methods from both the higher-level MessagingAnalyzer and the lower-level ConversationStats. So in this section we will dive deep into these functionalities.

In [31]:
!./miner/cli.py analyzer private - is_group

False


In [32]:
!./miner/cli.py analyzer group - is_group

True


As we anticipated. Now all the channels in the two group. A channel is a conversation on Messenger with somebody or in a group.

In [33]:
!./miner/cli.py analyzer private - channels

Bugs Bunny
Tőke Hal
Benedek Elek
Foo Bar


In [34]:
!./miner/cli.py analyzer group - channels

Tőke Hal, Foo Bar, Donald Duck and 2 others
Foo Bar, John Doe and Bugs Bunny
marathon


You can also get the number of these channels.

In [35]:
!./miner/cli.py analyzer group - number_of_channels

3


 Now all the participants of these channels.

In [36]:
!./miner/cli.py analyzer private - participants

Benedek Elek
Bugs Bunny
Foo Bar
Jenő Rejtő
Tőke Hal


In [37]:
!./miner/cli.py analyzer group - participants

Bugs Bunny
Donald Duck
Dér Dénes
Facebook User
Foo Bar
Jenő Rejtő
John Doe
Tőke Hal


These are all the people who are in the above channels. Not everyone of them contributed to the channels tho'. For this to show, we have to filter the group messages into one single channel, where test user `Teflon Musk` have not contributed.

In [38]:
!./miner/cli.py analyzer group --channels=marathon - participants

Bugs Bunny
Donald Duck
Foo Bar
Jenő Rejtő


In [39]:
!./miner/cli.py analyzer group --channels=marathon - contributors

Jenő Rejtő
Foo Bar
Donald Duck


As you can see the number of contributors is only 3, as opposed to participants, which is 4. Note that you can use the `number_of_contributors` function as well.

Next we have a `participant_to_channel_map` data structure, which holds information on, which people is in which channel. In `private` it's fairly straightforward (if not redundant), as the channel name and the particiapant is exactly the same, but for `group` this can be really useful information.

In [40]:
!./miner/cli.py analyzer private - participant_to_channel_map

Tőke Hal:     ["Tőke Hal"]
Jenő Rejtő:   ["Benedek Elek", "Foo Bar", "Tőke Hal", "Bugs Bunny"]
Foo Bar:      ["Foo Bar"]
Bugs Bunny:   ["Bugs Bunny"]
Benedek Elek: ["Benedek Elek"]


In [41]:
!./miner/cli.py analyzer group - participant_to_channel_map

Tőke Hal:      ["Tőke Hal, Foo Bar, Donald Duck and 2 others"]
Jenő Rejtő:    ["Foo Bar, John Doe and Bugs Bunny", "marathon", "Tőke Hal, Foo Bar, Donald Duck and 2 others"]
Dér Dénes:     ["Tőke Hal, Foo Bar, Donald Duck and 2 others"]
Facebook User: ["Tőke Hal, Foo Bar, Donald Duck and 2 others"]
Donald Duck:   ["marathon", "Tőke Hal, Foo Bar, Donald Duck and 2 others"]
Foo Bar:       ["Foo Bar, John Doe and Bugs Bunny", "marathon", "Tőke Hal, Foo Bar, Donald Duck and 2 others"]
Bugs Bunny:    ["Foo Bar, John Doe and Bugs Bunny", "marathon"]
John Doe:      ["Foo Bar, John Doe and Bugs Bunny"]


Next let's check the number of conversations created by our test user.

In [42]:
!./miner/cli.py analyzer private - number_of_convos_created_by_me

4


In [43]:
!./miner/cli.py analyzer group - number_of_convos_created_by_me

2


We have the max-,mean-, and min_channel size. Again, for `private` all of this will be 2, but for `groups` it is more interesting.

In [44]:
!./miner/cli.py analyzer group - min_channel_size

4


In [45]:
!./miner/cli.py analyzer group - mean_channel_size

4.666666666666667


In [46]:
!./miner/cli.py analyzer group - max_channel_size

6


We can get all the channels for one conversation partner of ours, which is again a more useful feature for `group` convos.

In [47]:
!./miner/cli.py analyzer group - all_channels --name='Foo Bar'

Foo Bar, John Doe and Bugs Bunny
marathon
Tőke Hal, Foo Bar, Donald Duck and 2 others


In [48]:
!./miner/cli.py analyzer group - all_channels --name='John Doe'

Foo Bar, John Doe and Bugs Bunny


Another nice function is the `ranking_by_statistic`. You can rank the participants of the conversations by some statistics.

In [49]:
!./miner/cli.py analyzer private - ranking_by_statistic

Foo Bar:      48.38709677419355
Tőke Hal:     22.580645161290324
Bugs Bunny:   19.35483870967742
Benedek Elek: 9.67741935483871


In [50]:
!./miner/cli.py analyzer group - ranking_by_statistic

Donald Duck:   33.333333333333336
Jenő Rejtő:    22.22222222222222
Foo Bar:       16.666666666666668
Bugs Bunny:    5.555555555555555
Dér Dénes:     5.555555555555555
Facebook User: 5.555555555555555
John Doe:      5.555555555555555
Tőke Hal:      5.555555555555555


Note that this function has some parameters. Let's see the manual for this function.

In [51]:
!./miner/cli.py analyzer group - ranking_by_statistic --help

INFO: Showing help with the command 'Facebook-Data-Miner analyzer group - ranking_by_statistic -- --help'.

[1mNAME[0m
    Facebook-Data-Miner analyzer group ranking_by_statistic

[1mSYNOPSIS[0m
    Facebook-Data-Miner analyzer group - ranking_by_statistic <flags>

[1mFLAGS[0m
    --by=[4mBY[0m
    --ranking=[4mRANKING[0m
    --top=[4mTOP[0m


We can change the `by` parameter to word count (wc) or character count (cc)...

In [52]:
!./miner/cli.py analyzer private - ranking_by_statistic --by=cc

Benedek Elek: 41.76904176904177
Foo Bar:      34.3980343980344
Bugs Bunny:   12.285012285012286
Tőke Hal:     11.547911547911548


Note how this changes the ranking. 

The output shows ranking in percent, but we can change it to absolute count.

In [53]:
!./miner/cli.py  analyzer private - ranking_by_statistic --by=wc --ranking=count

Foo Bar:      34
Benedek Elek: 32
Bugs Bunny:   14
Tőke Hal:     11


It is quite possible that if you want to try this out with your own data, you will have tons of entries here. Change the `top` parameter if you want to limit the number of outputs.

In [54]:
!./miner/cli.py analyzer private - ranking_by_statistic --by=text_mc --ranking=count --top=3

Foo Bar:    8
Tőke Hal:   7
Bugs Bunny: 4


#### Private and Group ConversationStats

**TL;DR**:
We access this object's methods through the same facade through we access the Analyzer object's methods, although there is quite a difference between the two. 

*A detailed description:*
As the name suggests this class is a container for holding statsictical data/information about converations. The basic concept is that it does not know general conversation metadata, since it is only constructed by the messages and the metadata of unique messages (who sent it, what kind of messages is it, when was it sent). This object is created by `MessagingAnalyzer` class by passing in the DataFrame as input. The DataFrame is created from all the conversations that the analyzer holds (remember you can filter them, down to a single conversation). 

So to sum it up, `MessagingAnalyzer` knows about the channels and all the metadata of the conversations, while `ConversationStats` only knows about the messages themselves.

We expose `ConversationStats`' interesting properties and methods, so let's discover them.

In [55]:
!./miner/cli.py  analyzer private - creator




You got a warning, because this method only makes sense if there is only one conversation under analysis. So we should filter the private conversations first.

In [56]:
!./miner/cli.py  analyzer private --channels='Teflon Musk' - creator




In [57]:
!./miner/cli.py analyzer group --channels=marathon - creator

Jenő Rejtő


You can get the the timestamp of the first and the last message sent. Remember if you don't filter the the messaging data, you will get the first message ever sent by or to you, and the last message before downloading your Facebook data that was sent by or to you.

In [58]:
!./miner/cli.py analyzer group - start

2011-07-17 15:00:06.580000


In [59]:
!./miner/cli.py  analyzer group --participants="Teflon Musk" - start

In [60]:
!./miner/cli.py  analyzer private --senders="Benedek Elek" - end

2018-01-10 22:08:26.047000


You can get all the `messages` as well, be it `text` or `media`, but you can also get these separately. Since these are pandas DataFrames, you can pipe them into an output file, just as it was possible with `friends` or `conversations`.

In [61]:
!./miner/cli.py  analyzer private - messages

timestamp_ms,sender_name,content,type,partner,videos,audio_files,photos,gifs,reactions,files
2014-09-24 17:02:08.715,Jenő Rejtő,are you the real Bugs Bunny?,Generic,Bugs Bunny,,,,,,
2014-11-09 19:56:46.047,Jenő Rejtő,older stuff,Generic,Tőke Hal,,,,,,
2014-11-09 20:13:26.047,Tőke Hal,testing multiple files,Generic,Tőke Hal,,,,,,
2014-11-09 23:13:48.715,Jenő Rejtő,yo,Generic,Tőke Hal,,,,,,
2014-11-09 23:13:48.715,Jenő Rejtő,are you the real Bugs Bunny?,Generic,Bugs Bunny,,,,,,
2014-11-10 12:20:06.047,Tőke Hal,yo,Generic,Tőke Hal,,,,,,
2014-11-10 12:21:46.047,Tőke Hal,zup,Generic,Tőke Hal,,,,,,
2014-11-10 12:21:46.047,Bugs Bunny,no,Generic,Bugs Bunny,,,,,,
2014-11-10 12:26:46.047,Bugs Bunny,no,Generic,Bugs Bunny,,,,,,
2014-11-10 12:30:45.145,Jenő Rejtő,not much,Generic,Tőke Hal,,,,,,
2014-11-22 02:17:25.145,Jenő Rejtő,,Generic,Bugs Bunny,,,[{'uri': 'messages/inbox/TeflonMusk_fSD454F/photos/index.jpeg'}],,,
2014-12-03 16:07:25.145,Jenő Rejtő,not,Generic,Tőke Hal,,,,,,
2014-12-26 20:01:46.

In [62]:
!./miner/cli.py  analyzer group --participants="Donald Duck" - messages

timestamp_ms,sender_name,content,type,partner,photos,gifs
2011-07-17 15:00:06.580,Donald Duck,test,Generic,"Tőke Hal, Foo Bar, Donald Duck and 2 others",,
2011-07-17 15:00:13.721,Foo Bar,what do you test,Generic,"Tőke Hal, Foo Bar, Donald Duck and 2 others",,
2011-07-17 15:00:32.011,Tőke Hal,basic group messages,Generic,"Tőke Hal, Foo Bar, Donald Duck and 2 others",,
2011-07-17 15:00:32.012,Dér Dénes,blabla,Generic,"Tőke Hal, Foo Bar, Donald Duck and 2 others",,
2011-07-17 15:02:54.237,Facebook User,ok,Generic,"Tőke Hal, Foo Bar, Donald Duck and 2 others",,
2018-04-19 12:31:42.152,Jenő Rejtő,marathon?,Generic,marathon,,
2018-04-19 12:32:21.074,Foo Bar,yapp yapp :D,Generic,marathon,,
2018-04-19 12:32:35.273,Jenő Rejtő,You named the group marathon.,Generic,marathon,,
2018-04-19 13:35:37.066,Donald Duck,,Generic,marathon,,[{'uri': 'messages/inbox/marathon_sfFSFiD76/gifs/21297336_10214236646047101_8870179296803553280_n_2116299178397271.gif'}]
2018-04-19 13:35:49.717,Donald Duck,i start tod

In [63]:
!./miner/cli.py  analyzer group --senders="Donald Duck" - messages

timestamp_ms,sender_name,content,type,partner,photos,gifs
2011-07-17 15:00:06.580,Donald Duck,test,Generic,"Tőke Hal, Foo Bar, Donald Duck and 2 others",,
2018-04-19 13:35:37.066,Donald Duck,,Generic,marathon,,[{'uri': 'messages/inbox/marathon_sfFSFiD76/gifs/21297336_10214236646047101_8870179296803553280_n_2116299178397271.gif'}]
2018-04-19 13:35:49.717,Donald Duck,i start today,Generic,marathon,,
2018-04-19 13:37:39.673,Donald Duck,,Generic,marathon,"[{'uri': 'messages/inbox/marathon_sfFSFiD76/photos/index.jpeg', 'creation_timestamp': 1524137857}]",
2018-04-19 13:38:02.444,Donald Duck,we could go but running is free,Generic,marathon,,
2018-04-19 14:52:39.709,Donald Duck,:D,Generic,marathon,,



See, we can filter for `participants` and for `senders`. 

Filtering for the former means, we want all the messages that was sent in a channel where the subject was a participant. 

Filtering for the latter means we only want the subject's messages.

Now let's get the text and media messages only.

In [64]:
!./miner/cli.py  analyzer private --channels="Foo Bar" - text

timestamp_ms,content
2020-02-13 06:15:28.715,Lorem lorim.. foo bar 😡😡😡
2020-02-13 06:15:38.715,Ut akar ... consequat. oO wow :P xd :D
2020-02-14 12:48:48.047,Duis duia .. ! xdddddd :D
2020-02-14 15:35:28.047,Excepteur...laborum. :D
2020-02-18 00:08:48.047,What the hack? xdddddd :D
2020-04-02 20:08:48.047,Whet? Check this! :P
2020-05-03 12:15:28.123,OUT!
2020-08-08 20:22:08.321,OUT! ❤



In [65]:
!./miner/cli.py  analyzer private --channels="Foo Bar" - media

timestamp_ms,photos,gifs,files,videos,audio_files
2020-02-14 01:42:08.145,[{'uri': 'messages/inbox/FooBar_n5fd6gG50h/photos/blueberry-5417154_960_720.jpg'}],,,,
2020-02-14 04:28:48.047,,,[{'uri': 'messages/inbox/FooBar_n5fd6gG50h/files/1810.04805.pdf'}],,
2020-02-14 18:22:08.145,,[{'uri': 'messages/inbox/FooBar_n5fd6gG50h/gifs/97999627_1419172538270405_8596479473619042304_n_2963870430335255.gif'}],,,
2020-02-18 08:28:48.145,,[{'uri': 'messages/inbox/FooBar_n5fd6gG50h/gifs/19349964_1624604560892442_7457726181358436352_n_487109582171361.gif'}],,,
2020-02-26 13:42:08.145,[{'uri': 'messages/inbox/FooBar_n5fd6gG50h/photos/apple-5391076_960_720.jpg'}],,,,
2020-03-09 11:48:48.047,,,,,"[{'uri': 'messages/inbox/FooBar_n5fd6gG50h/audio/audioclip15905232600004598_2621787141481389.mp4', 'creation_timestamp': 1583750927}]"
2020-04-25 23:42:08.047,,,,"[{'uri': 'messages/inbox/FooBar_n5fd6gG50h/videos/video1501528035_1573509826004448.mp4', 'creation_timestamp': 1587850940, 'thumbnail': {'uri': 'messa

Note that you can also filter by dates. The input flags are `start`, `end`, `period`.

In [the other notebook](facebook-data-miner.ipynb) this is described as.
> Filter by `start` and `end` is pretty intuitive. You can use both datetime objects and strings (however note that you can only use strings in this format `%Y-%m-%d` as defined in [ISO_8601](https://en.wikipedia.org/wiki/ISO_8601)). Feel free to play areound with these filter parameters.
> Filtering by `period` is less intuitive. `period` in this context means a year, a month, a day, an hour. It is not so flexible, but pretty comfortable to use. You have to use `period` with either `start` or `end`. With `start` it's like the following equation `from start to start+period` and with `end` it's like `from end-period to end`.

In [66]:
!./miner/cli.py  analyzer private --start="2018-01-01" - messages

timestamp_ms,sender_name,content,type,partner,videos,audio_files,photos,gifs,reactions,files
2018-01-10 09:00:28.715,Jenő Rejtő,"yo Legyen az, hogy most megprobalok ekezet nelkul irni. Seems pretty easy. I need some english words in here. Right? A magyar szavak felismereset probalom tesztelni ezzekkel a mondatokkal.",Generic,Benedek Elek,,,,,,
2018-01-10 22:08:26.047,Benedek Elek,zup,Generic,Benedek Elek,,,,,,
2018-01-10 22:17:25.145,Jenő Rejtő,not much,Generic,Benedek Elek,,,,,,
2020-02-13 06:15:28.715,Jenő Rejtő,Lorem lorim.. foo bar 😡😡😡,Generic,Foo Bar,,,,,,
2020-02-13 06:15:38.715,Foo Bar,Ut akar ... consequat. oO wow :P xd :D,Generic,Foo Bar,,,,,"[{'reaction': '❤', 'actor': 'Jenő Rejtő'}]",
2020-02-14 01:42:08.145,Jenő Rejtő,,Generic,Foo Bar,,,[{'uri': 'messages/inbox/FooBar_n5fd6gG50h/photos/blueberry-5417154_960_720.jpg'}],,,
2020-02-14 04:28:48.047,Foo Bar,,Generic,Foo Bar,,,,,,[{'uri': 'messages/inbox/FooBar_n5fd6gG50h/files/1810.04805.pdf'}]
2020-02-14 12:48:48.047,Jenő Rejtő

In [67]:
!./miner/cli.py  analyzer private --end="2020-02-15" - messages

timestamp_ms,sender_name,content,type,partner,photos,gifs,reactions,files
2014-09-24 17:02:08.715,Jenő Rejtő,are you the real Bugs Bunny?,Generic,Bugs Bunny,,,,
2014-11-09 19:56:46.047,Jenő Rejtő,older stuff,Generic,Tőke Hal,,,,
2014-11-09 20:13:26.047,Tőke Hal,testing multiple files,Generic,Tőke Hal,,,,
2014-11-09 23:13:48.715,Jenő Rejtő,yo,Generic,Tőke Hal,,,,
2014-11-09 23:13:48.715,Jenő Rejtő,are you the real Bugs Bunny?,Generic,Bugs Bunny,,,,
2014-11-10 12:20:06.047,Tőke Hal,yo,Generic,Tőke Hal,,,,
2014-11-10 12:21:46.047,Tőke Hal,zup,Generic,Tőke Hal,,,,
2014-11-10 12:21:46.047,Bugs Bunny,no,Generic,Bugs Bunny,,,,
2014-11-10 12:26:46.047,Bugs Bunny,no,Generic,Bugs Bunny,,,,
2014-11-10 12:30:45.145,Jenő Rejtő,not much,Generic,Tőke Hal,,,,
2014-11-22 02:17:25.145,Jenő Rejtő,,Generic,Bugs Bunny,[{'uri': 'messages/inbox/TeflonMusk_fSD454F/photos/index.jpeg'}],,,
2014-12-03 16:07:25.145,Jenő Rejtő,not,Generic,Tőke Hal,,,,
2014-12-26 20:01:46.047,Bugs Bunny,,Generic,Bugs Bunny,,,,[{'ur

Write these outputs to file like this.

In [68]:
!./miner/cli.py  analyzer private - messages --output=$BASE_PATH/out.csv

Data was written to /home/levente/projects/facebook-data-miner/out.csv


In [69]:
!cat $BASE_PATH/out.csv

timestamp_ms,sender_name,content,type,partner,videos,audio_files,photos,gifs,reactions,files
2014-09-24 17:02:08.715,Jenő Rejtő,are you the real Bugs Bunny?,Generic,Bugs Bunny,,,,,,
2014-11-09 19:56:46.047,Jenő Rejtő,older stuff,Generic,Tőke Hal,,,,,,
2014-11-09 20:13:26.047,Tőke Hal,testing multiple files,Generic,Tőke Hal,,,,,,
2014-11-09 23:13:48.715,Jenő Rejtő,yo,Generic,Tőke Hal,,,,,,
2014-11-09 23:13:48.715,Jenő Rejtő,are you the real Bugs Bunny?,Generic,Bugs Bunny,,,,,,
2014-11-10 12:20:06.047,Tőke Hal,yo,Generic,Tőke Hal,,,,,,
2014-11-10 12:21:46.047,Tőke Hal,zup,Generic,Tőke Hal,,,,,,
2014-11-10 12:21:46.047,Bugs Bunny,no,Generic,Bugs Bunny,,,,,,
2014-11-10 12:26:46.047,Bugs Bunny,no,Generic,Bugs Bunny,,,,,,
2014-11-10 12:30:45.145,Jenő Rejtő,not much,Generic,Tőke Hal,,,,,,
2014-11-22 02:17:25.145,Jenő Rejtő,,Generic,Bugs Bunny,,,[{'uri': 'messages/inbox/TeflonMusk_fSD454F/photos/index.jpeg'}],,,
2014-12-03 16:07:25.145,Jenő Rejtő,not,Generic,Tőke Hal,,,,,,
2014-12-26 20:01:46.

In [70]:
!rm $BASE_PATH/out.csv # clear-up

You can even get a mapping on which messages are in which langauge. We use the [polyglot](https://pypi.org/project/polyglot/) package for this.

In [71]:
!./miner/cli.py analyzer private - message_language_map

are you the real Bugs Bunny?:                                                                                                                                                                 {"lang": "English", "confidence": 96.0}
older stuff:                                                                                                                                                                                  {"lang": "English", "confidence": 92.0}
testing multiple files:                                                                                                                                                                       {"lang": "English", "confidence": 95.0}
yo:                                                                                                                                                                                           null
zup:                                                                                                               

Get the percent of the messages?  No problem.

In [72]:
!./miner/cli.py analyzer private - message_language_ratio percent

English:      64.70588235294117
Latin:        17.647058823529413
Not detected: 11.764705882352942
Slovenian:    5.882352941176471


Or the count?

In [73]:
!./miner/cli.py analyzer private - message_language_ratio count

English:      11
Latin:        3
Not detected: 2
Slovenian:    1


Then you can also get all the messages that has reaction on it.

In [74]:
!./miner/cli.py analyzer private - reacted_messages

timestamp_ms,sender_name,content,type,partner,videos,audio_files,photos,gifs,reactions,files
2020-02-13 06:15:38.715,Foo Bar,Ut akar ... consequat. oO wow :P xd :D,Generic,Foo Bar,,,,,"[{'reaction': '❤', 'actor': 'Jenő Rejtő'}]",
2020-02-14 18:22:08.145,Jenő Rejtő,,Generic,Foo Bar,,,,[{'uri': 'messages/inbox/FooBar_n5fd6gG50h/gifs/97999627_1419172538270405_8596479473619042304_n_2963870430335255.gif'}],"[{'reaction': '😮', 'actor': 'Foo Bar'}]",
2020-02-18 08:28:48.145,Foo Bar,,Generic,Foo Bar,,,,[{'uri': 'messages/inbox/FooBar_n5fd6gG50h/gifs/19349964_1624604560892442_7457726181358436352_n_487109582171361.gif'}],"[{'reaction': '❤', 'actor': 'Jenő Rejtő'}]",



And the ratio of the reacted messages?

In [75]:
!./miner/cli.py analyzer private - portion_of_reacted

9.67741935483871


The facade exposes low-level statistics, like `message`, `word`, `character`, `text message` and `media message` **counts**. Let's see them.

In [76]:
!./miner/cli.py analyzer private - mc

31


In [77]:
!./miner/cli.py analyzer group - wc

40


In [78]:
!./miner/cli.py analyzer group --channels="Tőke Hal, Foo Bar, Donald Duck and 2 others" - cc

43


In [79]:
!./miner/cli.py analyzer private --start="2018-08-05" - text_mc

8


In [80]:
!./miner/cli.py analyzer private - media_mc

9


You can get the number of unique messages or words.

In [81]:
!./miner/cli.py analyzer private - unique_mc

17


In [82]:
!./miner/cli.py analyzer private - unique_wc

70


Or get the most used messages and words in messenger.

In [83]:
!./miner/cli.py analyzer private --senders=me --period='y' - most_used_msgs

,unique_values,counts
0,not much,2
1,are you the real Bugs Bunny?,2
2,not,1
3,older stuff,1
4,"yo Legyen az, hogy most megprobalok ekezet nelkul irni. Seems pretty easy. I need some english words in here. Right? A magyar szavak felismereset probalom tesztelni ezzekkel a mondatokkal.",1
5,What the hack? xdddddd :D,1
6,Lorem lorim.. foo bar 😡😡😡,1
7,Whet? Check this! :P,1
8,Duis duia .. ! xdddddd :D,1
9,OUT! ❤,1
10,yo,1



In [84]:
!./miner/cli.py analyzer private --senders=me - most_used_msgs

,unique_values,counts
0,not much,2
1,are you the real Bugs Bunny?,2
2,Lorem lorim.. foo bar 😡😡😡,1
3,OUT! ❤,1
4,yo,1
5,"yo Legyen az, hogy most megprobalok ekezet nelkul irni. Seems pretty easy. I need some english words in here. Right? A magyar szavak felismereset probalom tesztelni ezzekkel a mondatokkal.",1
6,older stuff,1
7,not,1
8,Duis duia .. ! xdddddd :D,1
9,Whet? Check this! :P,1
10,What the hack? xdddddd :D,1



In [85]:
!./miner/cli.py analyzer group --senders=partner - most_used_words

,unique_values,counts
0,test,3
1,messages,2
2,ok,2
3,group,2
4,do,2
5,what,2
6,yapp,2
7,you,2
8,:d,2
9,basic,2
10,running,1
11,today,1
12,start,1
13,i,1
14,free,1
15,is,1
16,but,1
17,go,1
18,blabla,1
19,could,1
20,we,1



You can also access all the types of media messages:
- photos,
- videos,
- gifs,
- audios,
- files.

Use any of them in the following format.

In [86]:
!./miner/cli.py analyzer private - photos

timestamp_ms,photos
2014-11-22 02:17:25.145,[{'uri': 'messages/inbox/TeflonMusk_fSD454F/photos/index.jpeg'}]
2020-02-14 01:42:08.145,[{'uri': 'messages/inbox/FooBar_n5fd6gG50h/photos/blueberry-5417154_960_720.jpg'}]
2020-02-26 13:42:08.145,[{'uri': 'messages/inbox/FooBar_n5fd6gG50h/photos/apple-5391076_960_720.jpg'}]



Speaking of media, you can also see the percentage of media messages and its opposite, percentage of text messages.

In [87]:
!./miner/cli.py analyzer private - percentage_of_text_messages

70.96774193548387


In [88]:
!./miner/cli.py analyzer group - percentage_of_media_messages

11.111111111111114


What is your average word length?

In [89]:
!./miner/cli.py analyzer group --senders=me - average_word_length

5.25


Ok, we arrived at the last two features. These are rather itneresting.

First let's group the low level-stats by time.

In [90]:
!./miner/cli.py analyzer private  - get_grouped_time_series_data --timeframe=y

timestamp,mc,text_mc,media_mc,wc,cc
2014-01-01,13,11,2,25,97
2018-01-01,3,3,0,32,170
2020-01-01,15,8,7,34,140



In [91]:
!./miner/cli.py analyzer private  - get_grouped_time_series_data --timeframe=m

timestamp,mc,text_mc,media_mc,wc,cc
2014-09-01,1,1,0,6,23
2014-11-01,10,9,1,18,71
2014-12-01,2,1,1,1,3
2018-01-01,3,3,0,32,170
2020-02-01,10,5,5,27,114
2020-03-01,1,0,1,0,0
2020-04-01,2,1,1,4,17
2020-05-01,1,1,0,1,4
2020-08-01,1,1,0,2,5



In [92]:
!./miner/cli.py analyzer private  - get_grouped_time_series_data --timeframe=d

timestamp,mc,text_mc,media_mc,wc,cc
2014-09-24,1,1,0,6,23
2014-11-09,4,4,0,12,55
2014-11-10,5,5,0,6,16
2014-11-22,1,0,1,0,0
2014-12-03,1,1,0,1,3
2014-12-26,1,0,1,0,0
2018-01-10,3,3,0,32,170
2020-02-13,2,2,0,14,51
2020-02-14,5,2,3,8,42
2020-02-18,2,1,1,5,21
2020-02-26,1,0,1,0,0
2020-03-09,1,0,1,0,0
2020-04-02,1,1,0,4,17
2020-04-25,1,0,1,0,0
2020-05-03,1,1,0,1,4
2020-08-08,1,1,0,2,5



In [93]:
!./miner/cli.py  analyzer private  - get_grouped_time_series_data --timeframe=h

timestamp,mc,text_mc,media_mc,wc,cc
2014-09-24 17:00:00,1,1,0,6,23
2014-11-09 19:00:00,1,1,0,2,10
2014-11-09 20:00:00,1,1,0,3,20
2014-11-09 23:00:00,2,2,0,7,25
2014-11-10 12:00:00,5,5,0,6,16
2014-11-22 02:00:00,1,0,1,0,0
2014-12-03 16:00:00,1,1,0,1,3
2014-12-26 20:00:00,1,0,1,0,0
2018-01-10 09:00:00,1,1,0,29,160
2018-01-10 22:00:00,2,2,0,3,10
2020-02-13 06:00:00,2,2,0,14,51
2020-02-14 01:00:00,1,0,1,0,0
2020-02-14 04:00:00,1,0,1,0,0
2020-02-14 12:00:00,1,1,0,6,20
2020-02-14 15:00:00,1,1,0,2,22
2020-02-14 18:00:00,1,0,1,0,0
2020-02-18 00:00:00,1,1,0,5,21
2020-02-18 08:00:00,1,0,1,0,0
2020-02-26 13:00:00,1,0,1,0,0
2020-03-09 11:00:00,1,0,1,0,0
2020-04-02 20:00:00,1,1,0,4,17
2020-04-25 23:00:00,1,0,1,0,0
2020-05-03 12:00:00,1,1,0,1,4
2020-08-08 20:00:00,1,1,0,2,5



Then, let's examine in which timeframes were/are you the most active? Note the pattern.

In [94]:
!./miner/cli.py analyzer private  - stats_per_timeframe --timeframe=y

2009: 0
2010: 0
2011: 0
2012: 0
2013: 0
2014: 13
2015: 0
2016: 0
2017: 0
2018: 3
2019: 0
2020: 15


In [95]:
!./miner/cli.py analyzer private  - stats_per_timeframe --timeframe=m

january:   3
february:  10
march:     1
april:     2
may:       1
june:      0
july:      0
august:    1
september: 1
october:   0
november:  10
december:  2


In [96]:
!./miner/cli.py analyzer private  - stats_per_timeframe --timeframe=d

monday:    6
tuesday:   2
wednesday: 6
thursday:  3
friday:    6
saturday:  3
sunday:    5


In [97]:
!./miner/cli.py analyzer private  - stats_per_timeframe --timeframe=h

0:  1
1:  1
2:  1
3:  0
4:  1
5:  0
6:  2
7:  0
8:  1
9:  1
10: 0
11: 1
12: 7
13: 1
14: 0
15: 1
16: 1
17: 1
18: 1
19: 1
20: 4
21: 0
22: 2
23: 3


### People
People in an acstracted class which combines the people from the messaging system and your friends. It is a one-method interface.

In [98]:
!./miner/cli.py people

,name,friend,message_dir,media_dir
0,Guy Fawkes,True,,
1,Daisy Duck,True,,
2,Bugs Bunny,True,teflonmusk_fsd454f,TeflonMusk_fSD454F
3,Dér Dénes,True,,
4,Tőke Hal,True,tokehal_sdf7fs9d876,
5,Foo Bar,True,foobar_n5fd6gG50h,FooBar_n5fd6gG50h
6,Szett Droxler,True,,
7,Donald Duck,True,,
8,John Doe,True,,
9,Benedek Elek,,benedekelek_s4f65sdg,
10,Jenő Rejtő,,,
11,Facebook User,,,



You can add an `--output` flag to write this in a file as usual.

### Report
<a id='report'>The</a> `report` node of the interface creates nicely formatted tables. Let's see what's in the box.

In [99]:
!./miner/cli.py report

[1mNAME[0m
    Facebook-Data-Miner report

[1mSYNOPSIS[0m
    Facebook-Data-Miner report [4mCOMMAND[0m

[1mCOMMANDS[0m
    [1m[4mCOMMAND[0m[0m is one of the following:

     basic_stats

     stats_per_timeframe


In [100]:
!./miner/cli.py report basic_stats

+---------+--------------+---------------+------+-----------+----------------+-------------+
| Message | Text message | Media message | Word | Character | Unique message | Unique word |
+---------+--------------+---------------+------+-----------+----------------+-------------+
|    31   |      22      |       9       |  91  |    407    |       17       |      70     |
+---------+--------------+---------------+------+-----------+----------------+-------------+


We have seent his already, but this output looks more concise and prettier of course.

The following tables would be familiar as well.

In [101]:
!./miner/cli.py report stats_per_timeframe --timeframe=y

+-----------+------+------+------+------+------+------+------+------+------+------+------+------+
|           | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 |
+-----------+------+------+------+------+------+------+------+------+------+------+------+------+
|  Message  |  0   |  0   |  0   |  0   |  0   |  13  |  0   |  0   |  0   |  3   |  0   |  15  |
|    Word   |  0   |  0   |  0   |  0   |  0   |  25  |  0   |  0   |  0   |  32  |  0   |  34  |
| Character |  0   |  0   |  0   |  0   |  0   |  97  |  0   |  0   |  0   | 170  |  0   | 140  |
+-----------+------+------+------+------+------+------+------+------+------+------+------+------+


In [102]:
!./miner/cli.py report stats_per_timeframe --timeframe=m

+-----------+---------+----------+-------+-------+-----+------+------+--------+-----------+---------+----------+----------+
|           | january | february | march | april | may | june | july | august | september | october | november | december |
+-----------+---------+----------+-------+-------+-----+------+------+--------+-----------+---------+----------+----------+
|  Message  |    3    |    10    |   1   |   2   |  1  |  0   |  0   |   1    |     1     |    0    |    10    |    2     |
|    Word   |    32   |    27    |   0   |   4   |  1  |  0   |  0   |   2    |     6     |    0    |    18    |    1     |
| Character |   170   |   114    |   0   |   17  |  4  |  0   |  0   |   5    |     23    |    0    |    71    |    3     |
+-----------+---------+----------+-------+-------+-----+------+------+--------+-----------+---------+----------+----------+


In [103]:
!./miner/cli.py report stats_per_timeframe --timeframe=d

+-----------+--------+---------+-----------+----------+--------+----------+--------+
|           | monday | tuesday | wednesday | thursday | friday | saturday | sunday |
+-----------+--------+---------+-----------+----------+--------+----------+--------+
|  Message  |   6    |    2    |     6     |    3     |   6    |    3     |   5    |
|    Word   |   6    |    5    |     39    |    18    |   8    |    2     |   13   |
| Character |   16   |    21   |    196    |    68    |   42   |    5     |   59   |
+-----------+--------+---------+-----------+----------+--------+----------+--------+


In [104]:
!./miner/cli.py report stats_per_timeframe --timeframe=h

+-----------+----+---+---+---+---+---+----+---+---+-----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
|           | 0  | 1 | 2 | 3 | 4 | 5 | 6  | 7 | 8 |  9  | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 |
+-----------+----+---+---+---+---+---+----+---+---+-----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
|  Message  | 1  | 1 | 1 | 0 | 1 | 0 | 2  | 0 | 1 |  1  | 0  | 1  | 7  | 1  | 0  | 1  | 1  | 1  | 1  | 1  | 4  | 0  | 2  | 3  |
|    Word   | 5  | 0 | 0 | 0 | 0 | 0 | 14 | 0 | 0 |  29 | 0  | 0  | 13 | 0  | 0  | 2  | 1  | 6  | 0  | 2  | 9  | 0  | 3  | 7  |
| Character | 21 | 0 | 0 | 0 | 0 | 0 | 51 | 0 | 0 | 160 | 0  | 0  | 40 | 0  | 0  | 22 | 3  | 23 | 0  | 10 | 42 | 0  | 10 | 25 |
+-----------+----+---+---+---+---+---+----+---+---+-----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+


### Plot

We can also create some plots with the plot node. See the possible comamnds you can use below.

**NOTE**: since we are calling shell commands from the terminal (and possibly also because of Python Fire) the plots will not show up. There will be another notebook covering these plots.

In [105]:
!./miner/cli.py plot

[1mNAME[0m
    Facebook-Data-Miner plot

[1mSYNOPSIS[0m
    Facebook-Data-Miner plot [4mCOMMAND[0m

[1mCOMMANDS[0m
    [1m[4mCOMMAND[0m[0m is one of the following:

     plot_convo_type_ratio

     plot_msg_type_ratio

     plot_ranking_of_friends_by_stats

     plot_stat_count_over_time_series

     plot_stat_count_per_time_period
