# Cascade Influence
This repository contains:
 - Scripts of cascade influence.
 - a small dataset for testing Cas.In.
 - A hands-on tutorial to walk you through some main steps to run Cas.In.
 
### Citation
The algorithm was introduced in this [paper](https://arxiv.org/abs/1802.09808):

**Bibtex**
```
@article{rizoiu2018debatenight,
  title={\# DebateNight: The Role and Influence of Socialbots on Twitter During the 1st US Presidential Debate},
  author={Rizoiu, Marian-Andrei and Graham, Timothy and Zhang, Rui and Zhang, Yifei and Ackland, Robert and Xie, Lexing},
  journal={arXiv preprint arXiv:1802.09808},
  year={2018}
}
```
### License
Both dataset and code are distributed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license, a copy of which can be obtained following this link. If you require a different license, please contact us at Marian-Andrei@rizoiu.eu or Lexing.Xie@anu.edu.au.


# Run Cas.In in terminal:

### Required packages:
  - numpy
  - pandas
    
### Arguments of Cas.In:

*--cascade_path* : the path of cascade file. 

*--time_decay* : the coefficient value of time decay. **Default**:-0.000068

*--save2csv* : save result to csv file. **Default**: False

### Command:
```bash
cd scripts
python3 influence.py --cascade_path path/to/file
```


# Test Dataset

### Dataset
We provide a toy dataset SMH for test. It contain news about the Sydney Morning Herald
- The data contains 20 cascades(one file per cascade). 
- The data are real cascades crawled in 2017.
- We perform the annonymization on the user_id by mapping a sequence(from 0 to n) to each users

### Format of cascade file:
 - A csv file with 3 columns(time, magnitude, user_id) 
 - Row in the file are sorted by time in seconds
 - The initial user should start at time 0s. 
 - The following user's time are the time lapse from initial user
 
eg:
```
time,magnitude,user_id 
0,4674,"0"
321,1327,"1"
339,976,"2"
383,477,"3"
699,1209,"4"
824,119,"5"
835,1408,"6"
1049,896,"7"
```

# Cascade influence tutorial
###  Preliminary
We need to first load all required packages of cascade influence.

In [None]:
cd scripts

In [2]:
import pandas as pd
import numpy as np
from casIn.user_influence import P,influence

## Compute influence in one cascade

###  Read data
read one cascade from SMH dataset

In [3]:
cascade = pd.read_csv("../data/SMH/SMH-cascade-0.csv")
cascade.head()

Unnamed: 0,time,magnitude,user_id
0,0,991,0
1,127,1352,1
2,2149,2057,2
3,2465,1155,3
4,2485,1917,4


###  Compute matrix P
We need to specify the time decay coefficient r. Here we choose -0.000068

In [4]:
p_ij = P(cascade,r = -0.000068)

###  Compute user influence and matrix M
The function *inflence()* will reture an array of influence and the matrix M

In [5]:
inf, m_ij = influence(p_ij)

###  Link influence with user_id

In [6]:
cascade["influence"] = pd.Series(inf)

In [7]:
cascade.head()

Unnamed: 0,time,magnitude,user_id,influence
0,0,991,0,60.0
1,127,1352,1,34.59037
2,2149,2057,2,29.656122
3,2465,1155,3,13.535845
4,2485,1917,4,15.913873


## Compute influence over multiple cascades
### Load function
The function *casIn()* compute influence in one cascade, which basically contain all the steps described above

In [8]:
from casIn.user_influence import casIn

In [9]:
influence = casIn(cascade_path="../data/SMH/SMH-cascade-0.csv",time_decay=-0.000068)
influence.head()

Unnamed: 0,time,magnitude,user_id,influence
0,0,991,0,60.0
1,127,1352,1,34.59037
2,2149,2057,2,29.656122
3,2465,1155,3,13.535845
4,2485,1917,4,15.913873


### Load multiple cascades
We provide 20 cascades for playing

In [10]:
cascades = []
for i in range(20):
    inf = casIn(cascade_path="../data/SMH/SMH-cascade-%d.csv" % i,time_decay=-0.000068)
    cascades.append(inf)
cascades = pd.concat(cascades)

### Compute user influence in multiple cascades
For the user appears in different cascades, we compute the average influence over multiple cascades

In [11]:
result = cascades.groupby("user_id").agg({"influence" : "mean"})

In [12]:
result.head()

Unnamed: 0_level_0,influence
user_id,Unnamed: 1_level_1
0,105.2
1,24.082416
2,12.556659
3,21.046287
4,16.699684
