## Importing the Library
To use the library, the `Stats` module needs to be imported.


In [1]:
from socialysis import Stats


## Processing Raw Data
Before you can use the library, your data needs to be processed first. The `Stats` function will extract the data and convert Facebook JSONs to a shape that is easier to manipulate.

To initialize the `Stats` Class, pass it the directory where your raw data is located. For example:



In [2]:
base = r"C:\Users\user\Downloads\facebook-user"

Then call the `Stats` to process the raw data and generate a more readable and processable format.

In [3]:
stats = Stats(base)

Parsing Data ...


100%|████████████████████████████████████████████████████████████████████████████████| 102/102 [02:36<00:00,  1.54s/it]


Building df ...
Convert Timestamps to DateTime format ...


100%|████████████████████████████████████████████████████████████████████████| 140624/140624 [00:14<00:00, 9991.88it/s]


## Customizing Data Generating

`Stats` has several parameters that can be used to customize the data generation process.


### Speeding Up the Process with Parallelization

To speed up the process, you can set the `parallel` parameter to `True` and specify the maximum number of threads to use with the `max_workers` parameter. By default, only two threads are used.



In [4]:
stats = Stats(base, parallel=True, max_workers=16)


Getting Audio Files Duration In Parallel ...


100%|█████████████████████████████████████████████████████████████████████████████| 2386/2386 [00:23<00:00, 101.26it/s]


Parsing Data ...


100%|████████████████████████████████████████████████████████████████████████████████| 102/102 [00:02<00:00, 44.50it/s]


Building df ...
Convert Timestamps to DateTime format ...


100%|███████████████████████████████████████████████████████████████████████| 140624/140624 [00:13<00:00, 10192.36it/s]


### Discarding Audio Data
Processing audio files is a computationally intensive process, especially if your data contains a lot of audio files. If you don't need to gain insights about your audio data, you can discard it by setting the `process_audio` parameter to `False`.

In [5]:
stats = Stats(base, process_audio=False)


Parsing Data ...


100%|████████████████████████████████████████████████████████████████████████████████| 102/102 [00:02<00:00, 46.16it/s]


Building df ...
Convert Timestamps to DateTime format ...


100%|███████████████████████████████████████████████████████████████████████| 140624/140624 [00:13<00:00, 10286.33it/s]


### Changing the Time Unit of Duration
By default, the duration of all of your audio, video, and calls is calculated in seconds. You can change this behavior by using the dur_unit parameter. `dur_unit` accepts one time unit out of `"sec"`, `"minute"`, `"hour"`, `"day"`.

In [6]:
stats = Stats(base, dur_unit="minute")


Parsing Data ...


100%|████████████████████████████████████████████████████████████████████████████████| 102/102 [02:20<00:00,  1.38s/it]


Building df ...
Convert Timestamps to DateTime format ...


100%|███████████████████████████████████████████████████████████████████████| 140624/140624 [00:13<00:00, 10385.96it/s]


## Other Useful Methods

### Saving and Restoring Data
The `freeze` method saves your data to be restored in a later time. This allows you to avoid regenerating the data every time you need to use it.



In [7]:
stats.freeze()


To restore the data instead of regenerating it, use the `restore` parameter when calling the `Stats` function.



In [8]:
stats = Stats(restore=True)


### Updating Data

The `.update` method can be used to add new data to an existing dataframe without the need to re-download the entire dataset. It allows you to add data that is either chronologically before or after the existing data. This can be useful if you have downloaded your Facebook data at different times and want to combine the data into a single dataframe. The `.update` method allows you to add data that is chronologically before or after the existing data. The `.first_msg_date` and `.last_msg_date` attributes can be used to check the time range of the current data. This information can be useful when deciding which new data to add using the `.update` method. For example, if the `.first_msg_date` is January 1, 2021 and the `.last_msg_date` is December 31, 2021, you can use .update to add data from January 1, 2020 to December 31, 2020 or from January 1, 2022 to December 31, 2022.

In [9]:
stats.first_msg_date

'2021-07-12 22:00:28.711000'

In [10]:
stats.last_msg_date

'2022-05-28 20:45:20.301000'

To update the data using the `.update` method, you need to pass the directory containing the new data as an argument, along with any other relevant parameters such as `parallel` and `process_audio`. You also need to specify whether the new data is chronologically before or after the existing data using `after` parameter

In [11]:
new_base=r'C:\Users\user\Downloads\facebook-user (2)'
stats.update(new_base,after=True,parallel=True,max_workers=16)

Getting Audio Files Duration In Parallel ...


100%|█████████████████████████████████████████████████████████████████████████████| 3586/3586 [00:35<00:00, 100.71it/s]


Parsing Data ...


100%|████████████████████████████████████████████████████████████████████████████████| 149/149 [00:03<00:00, 39.53it/s]


Building df ...
Convert Timestamps to DateTime format ...


100%|███████████████████████████████████████████████████████████████████████| 183666/183666 [00:18<00:00, 10146.73it/s]


In [12]:
#Our data is now up-to-date.
stats.first_msg_date,stats.last_msg_date

('2021-07-12 22:00:28.711000', '2022-09-24 07:57:32.235000')