# Facebook Data Miner Command-Line Interface

This notebook has basically two main purposes:
1. show-off the CLI by examples,
2. serve as a reference for testing the CLI.

The application has a well-defined CLI. The combinations of all the functions we wanted to expose was far too much to use [Click](https://click.palletsprojects.com/en/7.x/) and its deorators. Instead we went for the simple-stupid [Python Fire](https://google.github.io/python-fire/guide/). I call it simple-stupid, because beside the functions that we wanted to expose and some face classes the only thing we had to do is to pass the application's main entrypoint to Python Fire like this:
```
app = App(DATA_PATH)c
Fire(app, name='Facebook-Data-Miner')
```

Let's set up first the correct working directory. This is needed for the CLI  to work. In a shell environment you usually want to do something like the following in the root folder of this project:
```
export PYTHONPATH="$PWD"
```

In [None]:
import os
BASE_PATH = None

In [None]:
if not BASE_PATH:
    BASE_PATH = os.path.dirname(os.path.abspath(os.getcwd()))
BASE_PATH

In [None]:
try:
	os.chdir(BASE_PATH)
	print(f"OK! Changed to: {os.getcwd()} directory.")
except:
	print(f"WARNING! Couldn't change directory. Current is: {os.getcwd()}")

## Features of the CLI

Now we can start with the CLI tool. Python Fire takes an object and lets the user to call all of the objects public method (in Python that means: methods without a starting underscore).

In this notebook we will cover the following groups of methods:
- friends,
- conversations,
- analyzer,
- people,
- report,
- plot.

This means you can get detailed information for these groups once you add these keywords after the file's name you are calling: that is `./miner/app.py`. Let's get the help for the main entrypoint (NOTE: Python Fire pipes the output of the `help` into an interactive file reader, thus the output will be cut after one screen).

In [None]:
!./miner/app.py --help

### Friends

In [None]:
!./miner/app.py friends --help

As you can see friends does not have further executable commands you can follow it up with. The description should be quite clear about what these params/flags can do. Let's just first use it without flags.

In [None]:
!./miner/app.py friends

In [None]:
!./miner/app.py friends --sort=name

In [None]:
!./miner/app.py friends --sort=name --dates=False

Notice that this ouput is formatted as a CSV. You can aslo format it as a `json`. Just pass `json` as the value of the `--output` flag.

In [None]:
!./miner/app.py friends --sort=name --output=json

We can also write this to an `output` file instead of `stdout`.

In [None]:
!./miner/app.py friends --sort=name --output=$PWD/notebooks/out.csv

In [None]:
!cat $PWD/notebooks/out.csv

Or you can write it to a json, just add a filename that ends with `.json`.

In [None]:
!./miner/app.py friends --sort=name --output=$PWD/notebooks/out.json

In [None]:
!cat $PWD/notebooks/out.json

Let's clear up.

In [None]:
!rm $PWD/notebooks/out.csv
!rm $PWD/notebooks/out.json

### Conversations
Conversations is also an interface that points to a single function. So let's dive into it.

In [None]:
!./miner/app.py conversations --help

In [None]:
!./miner/app.py conversations

As you can see calling this node of the interfacer prints out all the data private conversation data we have. We don's see, but the parameter `kind` itnernally defaults to `private`. We can change this to `group`.

In [None]:
!./miner/app.py conversations --kind=group

You can also filter the conversations to some specific channels. Note that you can provide more values to one flag, but the separation of these values is really clunky. The separator is `;!;`. The reason for this is that group names can actually contain a number of characters. This combination makes it so, that the separation works 99.99% of the time. 

In [None]:
!./miner/app.py conversations --channels="Foo Bar;!;Teflon Musk"

You can also filter out some columns by providing column names that you want in the output.

In [None]:
!./miner/app.py conversations --channels="Foo Bar" --cols='sender_name;!;type;!;content'

As of the parameter `output`, it works the same as it works in the case of CLI node `friends`. There are 4 possibilities: format csv and json in file or on stdout.

### Messaging Analyzer
Now we got to the most complex node in this CLI. So let's start with the help message. As you will see this only prints out information on the analyzer function's signature. 

In [None]:
!./miner/app.py analyzer --help

The function can have two distinct outcome based on wether you provide any value to the `kind` parameter. 

**If not**, then you will get the rather high level MessagingAnalyzer object's methods, which has functionalities that analyze messages both in private and group channels.

**Otherwise**, if `kind` has the value `private` or `group`, you will get a facade object to the lower-level `{Private|Group}MessagingAnalyzer` and `{Private|Group}ConversationStats` objects. This facade has almost 50 methods, and a lot of them take parameters as well. We will cover all of the functions for the sake of completeness, and also because this notebook is a subject for tests. This will be a really long part of the notebook, but it contains most of the information. However if you want to skip to the [goodies](#report), like reports and plot, go ahead.

Let's go with the first story. No value provided for `kind`.

In [None]:
!./miner/app.py analyzer

Let's go over these methods. 

**IMPORTANT NOTE**: as you have seen the `analyzer` function has quite a lot input parameters, which of course can be empty. To tell `Python Fire` that you don't want to fill those values, you have to use a separator. The default is a dash (`-`), but you can change this to almost any character (see [this](https://google.github.io/python-fire/guide/#calling-functions) for reference).

In [None]:
!./miner/app.py analyzer - all_interactions --help # NOTE the single `-` after the analyzer

Note also, that you can use `--help` with the functions as well. `Python Fire` will read the signature and the docstring of the function and create a help message from them.

In [None]:
!./miner/app.py analyzer - people_i_have_private_convo_with

In [None]:
!./miner/app.py analyzer - people_i_have_group_convo_with

In [None]:
!./miner/app.py analyzer - get_who_i_have_private_convo_with_from_a_group --group_name=marathon

In [None]:
!./miner/app.py analyzer - how_much_i_speak_in_private_with_group_members --group_name=marathon

In [None]:
!./miner/app.py analyzer - is_priv_msg_first_then_group --name='Foo Bar'

#### Private and Group Messaging Analyzer
Both `private` and `group` is created with the very same class, but since the minor differences in the inner structure of two  channels there are some methods that make more sense for one and less for the other one. But we will cover this soon.

Now let's start with a help message.

In [None]:
!./miner/app.py analyzer private --help

And one for `group` as well, delete the `#` sign if you want to verify.

In [None]:
#!./miner/app.py analyzer group --help

Now as you can see both man pages look the same. 

As described above the facade we get exposes methods from both the higher-level MessagingAnalyzer and the lower-level ConversationStats. So in this section we will dive deep into these functionalities.

In [None]:
!./miner/app.py analyzer private - is_group

In [None]:
!./miner/app.py analyzer group - is_group

As we anticipated. Now all the channels in the two group. A channel is a conversation on Messenger with somebody or in a group.

In [None]:
!./miner/app.py analyzer private - channels

In [None]:
!./miner/app.py analyzer group - channels

You can also get the number of these channels.

In [None]:
!./miner/app.py analyzer group - number_of_channels

 Now all the participants of these channels.

In [None]:
!./miner/app.py analyzer private - participants

In [None]:
!./miner/app.py analyzer group - participants

These are all the people who are in the above channels. Not everyone of them contributed to the channels tho'. For this to show, we have to filter the group messages into one single channel, where test user `Teflon Musk` have not contributed.

In [None]:
!./miner/app.py analyzer group --channels=marathon - participants

In [None]:
!./miner/app.py analyzer group --channels=marathon - contributors

As you can see the number of contributors is only 3, as opposed to participants, which is 4. Note that you can use the `number_of_contributors` function as well.

Next we have a `participant_to_channel_map` data structure, which holds information on, which people is in which channel. In `private` it's fairly straightforward (if not redundant), as the channel name and the particiapant is exactly the same, but for `group` this can be really useful information.

In [None]:
!./miner/app.py analyzer private - participant_to_channel_map

In [None]:
!./miner/app.py analyzer group - participant_to_channel_map

Next let's check the number of conversations created by our test user.

In [None]:
!./miner/app.py analyzer private - number_of_convos_created_by_me

In [None]:
!./miner/app.py analyzer group - number_of_convos_created_by_me

We have the max-,mean-, and min_channel size. Again, for `private` all of this will be 2, but for `groups` it is more interesting.

In [None]:
!./miner/app.py analyzer group - min_channel_size

In [None]:
!./miner/app.py analyzer group - mean_channel_size

In [None]:
!./miner/app.py analyzer group - max_channel_size

We can get all the channels for one conversation partner of ours, which is again a more useful feature for `group` convos.

In [None]:
!./miner/app.py analyzer group - all_channels --name='Foo Bar'

In [None]:
!./miner/app.py analyzer group - all_channels --name='John Doe'

Another nice function is the `ranking_by_statistic`. You can rank the participants of the conversations by some statistics.

In [None]:
!./miner/app.py analyzer private - ranking_by_statistic

In [None]:
!./miner/app.py analyzer group - ranking_by_statistic

Note that this function has some parameters. Let's see the manual for this function.

In [None]:
!./miner/app.py analyzer group - ranking_by_statistic --help

We can change the `by` parameter to word count (wc) or character count (cc)...

In [None]:
!./miner/app.py analyzer private - ranking_by_statistic --by=cc

Note how this changes the ranking. 

The output shows ranking in percent, but we can change it to absolute count.

In [None]:
!./miner/app.py analyzer private - ranking_by_statistic --by=wc --ranking=count

It is quite possible that if you want to try this out with your own data, you will have tons of entries here. Change the `top` parameter if you want to limit the number of outputs.

In [None]:
!./miner/app.py analyzer private - ranking_by_statistic --by=text_mc --ranking=count --top=3

#### Private and Group ConversationStats

**TL;DR**:
We access this object's methods through the same facade through we access the Analyzer object's methods, although there is quite a difference between the two. 

*A detailed description:*
As the name suggests this class is a container for holding statsictical data/information about converations. The basic concept is that it does not know general conversation metadata, since it is only constructed by the messages and the metadata of unique messages (who sent it, what kind of messages is it, when was it sent). This object is created by `MessagingAnalyzer` class by passing in the DataFrame as input. The DataFrame is created from all the conversations that the analyzer holds (remember you can filter them, down to a single conversation). 

So to sum it up, `MessagingAnalyzer` knows about the channels and all the metadata of the conversations, while `ConversationStats` only knows about the messages themselves.

We expose `ConversationStats`' interesting properties and methods, so let's discover them.

In [None]:
!./miner/app.py analyzer private - creator

You got a warning, because this method only makes sense if there is only one conversation under analysis. So we should filter the private conversations first.

In [None]:
!./miner/app.py analyzer private --channels='Teflon Musk' - creator

In [None]:
!./miner/app.py analyzer group --channels=marathon - creator

You can get the the timestamp of the first and the last message sent. Remember if you don't filter the the messaging data, you will get the first message ever sent by or to you, and the last message before downloading your Facebook data that was sent by or to you.

In [None]:
!./miner/app.py analyzer group - start

In [None]:
!./miner/app.py analyzer group --participants="Teflon Musk" - start

In [None]:
!./miner/app.py analyzer private --senders="Benedek Elek" - end

You can get all the `messages` as well, be it `text` or `media`, but you can also get these separately. Since these are pandas DataFrames, you can pipe them into an output file, just as it was possible with `friends` or `conversations`.

In [None]:
!./miner/app.py analyzer private - messages

In [None]:
!./miner/app.py analyzer group --participants="Donald Duck" - messages

In [None]:
!./miner/app.py analyzer group --senders="Donald Duck" - messages

See, we can filter for `participants` and for `senders`. 

Filtering for the former means, we want all the messages that was sent in a channel where the subject was a participant. 

Filtering for the latter means we only want the subject's messages.

Now let's get the text and media messages only.

In [None]:
!./miner/app.py analyzer private --channels="Foo Bar" - text

In [None]:
!./miner/app.py analyzer private --channels="Foo Bar" - media

Note that you can also filter by dates. The input flags are `start`, `end`, `period`.

In [the other notebook](facebook-data-miner.ipynb) this is described as.
> Filter by `start` and `end` is pretty intuitive. You can use both datetime objects and strings (however note that you can only use strings in this format `%Y-%m-%d` as defined in [ISO_8601](https://en.wikipedia.org/wiki/ISO_8601)). Feel free to play areound with these filter parameters.
> Filtering by `period` is less intuitive. `period` in this context means a year, a month, a day, an hour. It is not so flexible, but pretty comfortable to use. You have to use `period` with either `start` or `end`. With `start` it's like the following equation `from start to start+period` and with `end` it's like `from end-period to end`.

In [None]:
!./miner/app.py analyzer private --start="2018-01-01" - messages

In [None]:
!./miner/app.py analyzer private --end="2020-02-15" - messages

Write these outputs to file like this.

In [None]:
!./miner/app.py analyzer private - messages --output=$BASE_PATH/out.csv

In [None]:
!cat $BASE_PATH/out.csv

In [None]:
!rm $BASE_PATH/out.csv # clear-up

You can even get a mapping on which messages are in which langauge. We use the [polyglot](https://pypi.org/project/polyglot/) package for this.

In [None]:
!./miner/app.py analyzer private - message_language_map

Get the percent of the messages?  No problem.

In [None]:
!./miner/app.py analyzer private - message_language_ratio percent

Or the count?

In [None]:
!./miner/app.py analyzer private - message_language_ratio count

Then you can also get all the messages that has reaction on it.

In [None]:
!./miner/app.py analyzer private - reacted_messages

And the ratio of the reacted messages?

In [None]:
!./miner/app.py analyzer private - portion_of_reacted

The facade exposes low-level statistics, like `message`, `word`, `character`, `text message` and `media message` **counts**. Let's see them.

In [None]:
!./miner/app.py analyzer private - mc

In [None]:
!./miner/app.py analyzer group - wc

In [None]:
!./miner/app.py analyzer group --channels="Tőke Hal, Foo Bar, Donald Duck and 2 others" - cc

In [None]:
!./miner/app.py analyzer private --start="2018-08-05" - text_mc

In [None]:
!./miner/app.py analyzer private - media_mc

You can get the number of unique messages or words.

In [None]:
!./miner/app.py analyzer private - unique_mc

In [None]:
!./miner/app.py analyzer private - unique_wc

Or get the most used messages and words in messenger.

In [None]:
!./miner/app.py analyzer private --senders=me --period='y' - most_used_msgs

In [None]:
!./miner/app.py analyzer private --senders=me - most_used_msgs

In [None]:
!./miner/app.py analyzer group --senders=partner - most_used_words

You can also access all the types of media messages:
- photos,
- videos,
- gifs,
- audios,
- files.

Use any of them in the following format.

In [None]:
!./miner/app.py analyzer private - photos

Speaking of media, you can also see the percentage of media messages and its opposite, percentage of text messages.

In [None]:
!./miner/app.py analyzer private - percentage_of_text_messages

In [None]:
!./miner/app.py analyzer group - percentage_of_media_messages

What is your average word length?

In [None]:
!./miner/app.py analyzer group --senders=me - average_word_length

Ok, we arrived at the last two features. These are rather itneresting.

First let's group the low level-stats by time.

In [None]:
!./miner/app.py analyzer private  - get_grouped_time_series_data --timeframe=y

In [None]:
!./miner/app.py analyzer private  - get_grouped_time_series_data --timeframe=m

In [None]:
!./miner/app.py analyzer private  - get_grouped_time_series_data --timeframe=d

In [None]:
!./miner/app.py analyzer private  - get_grouped_time_series_data --timeframe=h

Then, let's examine in which timeframes were/are you the most active? Note the pattern.

In [None]:
!./miner/app.py analyzer private  - stats_per_timeframe --timeframe=y

In [None]:
!./miner/app.py analyzer private  - stats_per_timeframe --timeframe=m

In [None]:
!./miner/app.py analyzer private  - stats_per_timeframe --timeframe=d

In [None]:
!./miner/app.py analyzer private  - stats_per_timeframe --timeframe=h

### People
People in an acstracted class which combines the people from the messaging system and your friends. It is a one-method interface.

In [None]:
!./miner/app.py people

You can add an `--output` flag to write this in a file as usual.

### Report
<a id='report'>The</a> `report` node of the interface creates nicely formatted tables. Let's see what's in the box.

In [None]:
!./miner/app.py report

In [None]:
!./miner/app.py report basic_stats

We have seent his already, but this output looks more concise and prettier of course.

The following tables would be familiar as well.

In [None]:
!./miner/app.py report stats_per_timeframe --timeframe=y

In [None]:
!./miner/app.py report stats_per_timeframe --timeframe=m

In [None]:
!./miner/app.py report stats_per_timeframe --timeframe=d

In [None]:
!./miner/app.py report stats_per_timeframe --timeframe=h

### Plot

We can also create some plots with the plot node. See the possible comamnds you can use below.

**NOTE**: since we are calling shell commands from the terminal (and possibly also because of Python Fire) the plots will not show up. There will be another notebook covering these plots.

In [None]:
!./miner/app.py plot