# What This Package Do

This package combines the following steps:
1. Read in user like page records (shorthand `user_like_page`) and generate a page by page matrix containing shared users among pages (shorthand `page_page_matrix`).
2. Read in `page_page_matrix` and page information (`page_info`) to employ PCA and find the first principal component of the pages (we call them `page_score`). 
    - When the pages are political-related, the first principal components are usually highly correlated with ideology of the page.
3. Read in `page_score` and `user_like_page` to calculate the scores of users (we call them `user_score`).

To use these functions, simply indicate what input and output format do you want, and the paths that contain the inputs and outputs. For example, to calculate page by page matrix from user like page records:

```python
page_page_matrix = use.fb_score(
    input_format = "user_like_page", 
    output_format = "page_page_matrix", 
    input_path = "input/user_like_page.csv",
    output_path = "temp/page_page_matrix.csv", ...)
```

You can also jump between the steps:

```python
user_like_page_to_user_score = use.fb_score(
    input_format = "user_like",
    output_format = "user_score",
    input_path = "input/user_like_page.csv",
    output_path = "output/user_score_from_user_like.csv", ...)
```

# Data Formats

As mentioned above, there are 5 formats of data used in this package, taking of the following form:

## user_like_page

```
user_id,like_pages
1,"a,b,c"
2,"a,c"
```

This is the primary format of input. To generate this format, you can use something like

```sql
SELECT
  user_id,
  GROUP_CONCAT(page_id) AS like_pages
FROM some_table
GROUP BY
  user_id
```
in SQL or something like

```r
some_table %>% 
  dplyr::group_by(user_id) %>% 
  dplyr::mutate(like_pages=paste0(page_id, collapse=",")) 
```

in R. However, if the data is large, it is better to use SQL.

## page_page_matrix

```
page_id,a,b,c
a,2,1,2
b,1,1,1
c,2,1,2
```

We will calculate this from `user_like_page` using python dictionary.

## page_score

```
page_id,page_PC1_std
a,x
b,y
c,z
```

x, y, and z are calculate by employing PCA on `user_like_page`. The `page_PC1_std` is standardized to have 0 mean and standard deviation 1.

## user_score

```
page_id,user_PC1_mean
1,(x+y+z)/3
2,(x+z/2)
```

We then use `page_score` and `user_like_page` to calculate this, by calculating the means of the score of the pages each user likes.

## page_info

```
page_id,page_name, ...
a,"Donald J. Trump", ...
b,"Hillary Clinton", ...
c,"Fox News", ...
```

This is mainly to enable humans to interpret the page scores.

# Requirements
Make sure you have the following in your system:
1.  Python 3 and anaconda
2.  Package `tqdm` to print bar progess
    - If not, run `conda install -c conda-forge tqdm` to install

# Demo: Step by Step

Below we demonstrate how this package works using a sample data:
1.  `input/user_like_page.csv`
2.  `input/page_info.csv`

## Import Module & Read Data

In [3]:
import pandas as pd
import sys
import os
from fbscore import use
user_like_page = pd.read_csv("input/user_like_page.csv")
user_like_page

Unnamed: 0,user_id,like_pages,like_times
0,10152136566053282,627635147529519909830021785951839,111
1,10203689929160070,131201286936061,2
2,10156154318350066,"123624513983,138492335404,153080620724,1655839...",1231
3,10153173137243566,147823328841240944029279128241711025855990,151
4,708873579225524,"354522044588660,236963176319804,18468761129,15...",2413
5,10202665732883918,"9208539755,59306617060,31732483895,25396865800...",1111111
6,446300038853372,"407182479403663,34226391050,289079127883350,22...",121111
7,10206347422672730,"407570359384477,354522044588660,34693706539935...","1,1,1,2,2,1,3,3,1,1,2,1,1,1,2,1,4,1,2,1,1,1,1,..."
8,10202521135909391,21898300328165583971161122560544447547,411
9,932261546827748,"610045389164725,43179984254,369527829841639,36...","9,15,2,1,49,1,1,2,1,7,2,1,2,10,1,1,4,1,5,1,1,1..."


In [8]:
page_info = pd.read_csv("input/page_info.csv")
page_info

Unnamed: 0,page_id,page_name,type,type_sub,page_url,politician_name,party,chamber,state,district_rep,main_page,post_count
0,184179159166,Governor Robert Bentley,figure,politician,https://www.facebook.com/GovernorRobertBentley/,Robert J. Bentley,Republican,Governor,Alabama,Alabama,1.0,116.0
1,50850514797,Senator Richard Shelby,figure,politician,https://www.facebook.com/RichardShelby/,Richard Shelby,Republican,Senate,Alabama,Alabama,1.0,100.0
2,119152728153461,Ron Crumpton for U.S. Senate,figure,politician,https://www.facebook.com/Crumpton2016/,Ron Crumpton,Democratic,Senate,Alabama,Alabama,1.0,1.0
3,1374832002773140,U.S. Representative Bradley Byrne,figure,politician,https://www.facebook.com/RepByrne/,Bradley Byrne,Republican,House,Alabama,Alabama 1,0.0,
4,113184250589,Bradley Byrne,figure,politician,https://www.facebook.com/byrneforalabama/,Bradley Byrne,Republican,House,Alabama,Alabama 1,1.0,173.0
5,119448323032,Martha Roby for Congress,figure,politician,https://www.facebook.com/Martha-Roby-for-Congr...,Martha Roby,Republican,House,Alabama,Alabama 2,0.0,58.0
6,174519582574426,Representative Martha Roby,figure,politician,https://www.facebook.com/Representative.Martha...,Martha Roby,Republican,House,Alabama,Alabama 2,1.0,184.0
7,167124697074779,"Nathan Mathis for U.S. Representative, Alabama...",figure,politician,https://www.facebook.com/NathanMathis4District2/,Nathan Mathis,Democratic,House,Alabama,Alabama 2,1.0,39.0
8,427729637333509,Jesse T. Smith,figure,politician,https://www.facebook.com/JesseTSmithAL/,Jesse Smith,Democratic,House,Alabama,Alabama 3,1.0,710.0
9,168209963203416,Mike Rogers,figure,politician,https://www.facebook.com/ChairmanMikeRogers/,Mike Rogers,Republican,House,Alabama,Alabama 3,1.0,2.0


## Caluculate Page by Page Matrix

In [4]:
page_page_matrix = use.fb_score(
    input_format = "user_like_page", 
    output_format = "page_page_matrix", 
    input_path = "input/user_like_page.csv",
    output_path = "temp/page_page_matrix.csv",
    overwrite_file = True) # Overwrite output if there is already a file with 
                           # the same name. Default=False.
page_page_matrix

  0%|          | 0/9948 [00:00<?, ?it/s]

start reading user like page_data


100%|█████████▉| 9947/9948 [00:01<00:00, 7813.86it/s]


start turning user_like_page to page_page_matrix


100%|██████████| 1277/1277 [00:41<00:00, 30.93it/s]


done writing page score data:  page_page_matrix.csv


Unnamed: 0_level_0,10018702564,100450643330760,100503403440061,1005288206233641,101043269988443,101165966152,10128918116,10150135395755161,1015725248513375,101877489873438,...,97172997732,97212224368,97464672662,979613892126968,98658495398,990186434407571,99332606976,997108126967413,99881661864,999467420085871
page_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
10018702564,73,1,0,0,0,1,2,0,0,0,...,0,0,0,2,1,0,0,4,0,0
100450643330760,1,18,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
100503403440061,0,0,15,0,0,1,1,0,0,0,...,0,1,0,9,1,0,0,1,0,0
1005288206233641,0,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
101043269988443,0,0,0,0,18,0,0,0,0,1,...,0,0,1,0,0,0,0,2,0,0
101165966152,1,0,1,0,0,11,0,0,0,0,...,0,1,0,1,1,0,0,1,0,0
10128918116,2,0,1,0,0,0,47,0,0,0,...,0,2,0,7,2,0,0,0,0,0
10150135395755161,0,0,0,0,0,0,0,2,0,0,...,0,0,0,0,0,0,0,0,0,0
1015725248513375,0,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
101877489873438,0,0,0,0,1,0,0,0,0,11,...,0,0,0,0,0,0,0,0,0,0


## Calculate Page Score

In [5]:
page_score = use.fb_score(
    input_format = "page_page_matrix",
    output_format = "page_score",
    input_path = "temp/page_page_matrix.csv",
    output_path = "output/page_score.csv",
    page_info_path = "input/page_info.csv",
    overwrite_file = True,
    clinton_on_the_left = True, # To flip liberals on the left. Default=False.
    page_id_column_index = 0) # Indicate which column is page_idin page_info to 
                              # merge. Default=0.
page_score

column name:  page_id  will be changed to: page_id 
start reading page page matrix
start turning  page_page_matrix to page_score
done writing page score data:  page_score.csv


Unnamed: 0,page_id,page_name,type,type_sub,page_url,politician_name,party,chamber,state,district_rep,main_page,post_count,PC1,PC1_std,PC2,PC2_std
0,184179159166,Governor Robert Bentley,figure,politician,https://www.facebook.com/GovernorRobertBentley/,Robert J. Bentley,Republican,Governor,Alabama,Alabama,1.0,116.0,1.050940,0.113281,-5.695322,-0.866978
1,50850514797,Senator Richard Shelby,figure,politician,https://www.facebook.com/RichardShelby/,Richard Shelby,Republican,Senate,Alabama,Alabama,1.0,100.0,4.509386,0.486068,0.353980,0.053885
2,113184250589,Bradley Byrne,figure,politician,https://www.facebook.com/byrneforalabama/,Bradley Byrne,Republican,House,Alabama,Alabama 1,1.0,173.0,15.920522,1.716077,21.715493,3.305671
3,174519582574426,Representative Martha Roby,figure,politician,https://www.facebook.com/Representative.Martha...,Martha Roby,Republican,House,Alabama,Alabama 2,1.0,184.0,11.644153,1.255126,3.570251,0.543486
4,155220881193244,Congressman Mo Brooks,figure,politician,https://www.facebook.com/RepMoBrooks/,Mo Brooks,Republican,House,Alabama,Alabama 5,1.0,50.0,0.336682,0.036291,-7.148295,-1.088159
5,210333902420876,Sean Parnell,figure,politician,https://www.facebook.com/The.Official.Sean.Par...,Sean Parnell,Republican,Governor,Alaska,Alaska,1.0,27.0,3.200383,0.344970,-2.496039,-0.379963
6,407182479403663,Governor Doug Ducey,figure,politician,https://www.facebook.com/dougducey/,Doug Ducey,Republican,Governor,Arizona,Arizona,1.0,743.0,9.335382,1.006263,1.293909,0.196967
7,173347701125,Governor Jan Brewer,figure,politician,https://www.facebook.com/GovJanBrewer/,Jan Brewer,Republican,Governor,Arizona,Arizona,1.0,742.0,10.951846,1.180502,3.112590,0.473818
8,6425923706,John McCain,figure,politician,https://www.facebook.com/johnmccain/,John McCain,Republican,Senate,Arizona,Arizona,1.0,119.0,1.755906,0.189270,-5.897625,-0.897774
9,137746666253194,Ann Kirkpatrick,figure,politician,https://www.facebook.com/KirkpatrickForArizona/,Ann Kirkpatrick,Democratic,House,Arizona,Arizona 1,1.0,392.0,-9.744695,-1.050383,-0.833787,-0.126924


## Calculate User Score

In [14]:
user_score = use.fb_score(
    input_format = "page_score",
    output_format = "user_score",
    input_path = "output/page_score.csv",
    output_path = "output/user_score.csv",
    user_like_path = "input/user_like_page.csv",
    overwrite_file = True)
user_score

  2%|▏         | 214/9947 [00:00<00:04, 2093.69it/s]

start reading page score data
start turning page_score to user_score


100%|██████████| 9947/9947 [00:05<00:00, 1988.65it/s]


done writing user score data:  user_score.csv


Unnamed: 0_level_0,user_PC1_mean,user_PC1_median
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1
939776289401111,0.805045,0.874026
489106514599003,-0.662618,-0.532254
998166376873986,-0.219511,-0.219511
1057622107586565,-0.313328,-0.313328
10206625313337604,-0.085253,-0.085253
977302698967047,0.575423,0.575423
10205503694979081,-0.237927,-0.300443
10153038864777226,0.041955,-0.087871
2915831775234,-0.525551,-0.525551
1015929801785359,-0.161298,-0.155135


# Demo: Jump Steps

You can also jump between the steps. 

For example, if you do not need `page_page_matrix`, you can jump from `user_like_page` to `page_score`. 

Or, if you only want `user_score`, you can also jump from `user_like_page` directly to `user_score`.

The intermediate data needed are saved inside session memory.

Below we present some examples.

## From user_like_page to page_score

In [15]:
user_like_page_to_page_score = use.fb_score(
    input_format = "user_like_page",
    output_format = "page_score",
    input_path = "input/user_like_page.csv",
    output_path = "output/page_score_from_user_like.csv",
    page_info_path = "input/page_info.csv",
    overwrite_file = True,
    clinton_on_the_left = True,
    page_id_column_index = 0)

  0%|          | 0/9948 [00:00<?, ?it/s]

column name:  page_id  will be changed to: page_id 
start reading user like page_data


100%|█████████▉| 9947/9948 [00:01<00:00, 6965.45it/s]


start turning user_like_page to page_page_matrix


100%|██████████| 1277/1277 [00:42<00:00, 30.32it/s]


done writing page score data:  page_score_from_user_like.csv


## From user_like_page to user_score

In [16]:
user_like_page_to_user_score = use.fb_score(
    input_format = "user_like_page",
    output_format = "user_score",
    input_path = "input/user_like_page.csv",
    output_path = "output/user_score_from_user_like.csv",
    page_info_path = "input/page_info.csv",
    user_like_path = "input/user_like_page.csv",
    overwrite_file = True,
    clinton_on_the_left = True,
    page_id_column_index = 0)

  8%|▊         | 818/9948 [00:00<00:01, 8169.06it/s]

column name:  page_id  will be changed to: page_id 
start reading user like page_data


100%|█████████▉| 9947/9948 [00:01<00:00, 7443.40it/s]


start turning user_like_page to page_page_matrix


100%|██████████| 1277/1277 [00:40<00:00, 31.49it/s]


start turning  page_page_matrix to page_score


  5%|▍         | 462/9947 [00:00<00:04, 2302.23it/s]

start turning page_score to user_score


100%|██████████| 9947/9947 [00:04<00:00, 2406.26it/s]


done writing user score data:  user_score_from_user_like.csv
