Skip to content

detiuaveiro/social-network-mining

Repository files navigation

Social Network Mining

Mining in Social Networks

Hoje em dia as redes sociais possuem um papel muito relevante da difusão da informação. Os seus utilizadores estão constantemente a fazer publicações sobre os mais variados assuntos desde trivialidades e acontecimentos do dia a dia, a assuntos de maior relevância como política e ciência. A circulação desta informação tem vindo a aumentar exponencialmente, assim como a complexa rede envolvida na propagação desta informação e como tal várias áreas de estudo estão a dedicar-se a resolução de problemas relacionados com este tema. Mais recentemente a temática das "fake news", noticias falsas como o nome indica, tornou-se um tópico mediático, fazendo a sua resolução um problema de grande interesse.

https://detiuaveiro.github.io/social-network-mining/

Web App

Web app developed using the template: https://coreui.io/react/

To run, first install dependencies:

$ cd web-app
$ npm install

Then use the command to start the web app on port 3000: $ npm start

Instagram's Bot

Requirements:

  1. instaloader

Installation

pip3 install -r requirements.txt

Usage

python3 insta.py

Control Center

Postgresql

Istallation

Arch Linux Debian
sudo pacman -S postgresql sudo apt update; sudo apt install postgresql postgresql-contrib

|

$ sudo mkdir /var/lib/postgres/data
$ sudo chown postgres /var/lib/postgres/data
$ sudo -i -u postgres
$ initdb  -D '/var/lib/postgres/data'
$ sudo systemctl start postgresql

Run database server and create credentials

$ sudo su postgres -c psql
# CREATE USER postgres WITH PASSWORD 'password';
# ALTER ROLE postgres WITH CREATEDB; 
# CREATE DATABASE policies;
# CREATE DATABASE postgres;

Mongo

Installation

Arch Linux Debian
yay mongo tutorial
 $ sudo systemctl enable mongodb

Run database server and create credentials

$ mongo

Neo4j

Installation

Arch Linux Debian
tutorial tutorial
  • On Debian:
 $ sudo systemctl enable neo4j
  • On Arch Linux:
$ docker run neo4j

Run database server and create credentials

$ neo4j console 
$ cypher-shell            # to set new password

Configure ToR (necessary on the deployment server and on the programmers machines)

  • Instalation and setting:
$ sudo apt-get install tor         # instalation on Debian systems
$ sudo pacman -S tor               # instalation on Arch systems
$ sudo systemctl enable tor         # on the deployment server is recomended to enable the service instead of starting it each time the machine boots
  • On the server side, it's necessary to run a new tor service for each new bot we have:

    • For each new bot, create a file /etc/tor/torrc.{1..} with the following content (note that it's necessary to change the ports for each new bot and the number on the directory). Then, on the bots, we have to connect to the port defined on SocksPort:
    SocksPort 9060
    ControlPort 9061
    DataDirectory /var/lib/tor1
    
  • On the server, it is necessary to run the bots with the environment variable PROXY with the proxy value (the default value is the localhost value)

  • More info about how to configure ToR with python on link

Server Deploy

  • First, it's necessary to make a pull request to github with the tag deploy with the code we want to deploy next to the server. This will trigger the deploy workflow, that will create new images of the code to be deployed.
  • The first time, it's necessary to have all containers pre-created on the server. So, on the server terminal, run:
$ docker container run --env-file ~/PI_2020/env_vars/rest.env --publish 7000:7000 --detach --name rest docker.pkg.github.com/detiuaveiro/social-network-mining/rest                # run the rest container
$ docker container run --env-file ~/PI_2020/env_vars/bot.env --network host --detach --name bot docker.pkg.github.com/detiuaveiro/social-network-mining/bot                # run the bot container
$ docker container run --env-file ~/PI_2020/env_vars/control_center.env --detach --name control_center docker.pkg.github.com/detiuaveiro/social-network-mining/control_center               # run the control center container 
  • Also, it's necessary to have a watchtower container running on the server, that will deploy automaticly all the images created with the deploy github workflow:
$ docker run --env-file ~/PI_2020/env_vars/watchtower.env -d --name watchtower -v /var/run/docker.sock:/var/run/docker.sock -v ~/.docker/config.json:/config.json containrrr/watchtower
  • For the parlai service:
    1. First, we have must have a copy of the parlai repository on the server where we want to deploy the service. Then, we must run the command:
      $ python examples/interactive.py -m transformer/polyencoder \
       -mf zoo:pretrained_transformers/model_poly/model \
       --encode-candidate-vecs true \
       --eval-candidates fixed  \
       --fixed-candidates-path data/models/pretrained_transformers/convai_trainset_cands.txt
    • ATTENTION: you must stop this process once it begins to retrain with the given candidates (we just did this step to download an already trained model).
    1. The next step is to copy the tweets.txt with the tweets candidates to the directory ParlAI/data/models/pretrained_transformers. This file can be obtained on the directory code/backend/twitter/tweets_text/ once you run:
      $ python start_cc.py --export_tweets_text    # script in the directory code/backend/twitter of this repository. you also must run it in a virtual environment with the requirements in requirements_cc.txt installed
    2. Then, we have to copy the Dockerfile to build the correspondent image to the server. This can be found on the directory code/backend/twitter/docker/parlai and you must place it in the ParlAI/ directory on the server.
    3. It's also necessary to copy the requirements.txt from code/backend/twitter/docker/parlai of this repository to the ParlAI/ directory on the server.
    4. At last, you have to build the docker image and to create the correspondent container:
      $ docker build -t parlai .
      $ docker container run --publish 5555:5555 --restart always --detach --name parlai parlai

BDS AUTOMATIC IMPORT

cd scripts
chmod +x import_databases.sh
./import_databases.sh

BDS MANUAL IMPORT

MongoDB

  • Access
> mongoimport --db twitter --collection tweets --file scripts/mongodb/tweets.json -u user -p password
> mongoimport --db twitter --collection users --file scripts/mongodb/users.json -u user -p password
  • Indexation
> db.users.createIndex({id_str: 1}, { unique:true })
> db.users.createIndex({id: 1}, { unique:true })
> db.users.createIndex({screen_name: 1}, { unique:true })
> db.tweets.createIndex({id: 1}, { unique:true })
> db.tweets.createIndex({id_str: 1}, { unique:true })
> db.tweets.createIndex({protected: 1}, { unique:false })

PostgreSQL

  • Access
psql -U postgres_pi twitter -h localhost < scripts/postgresql/twitter.pgsql 
  • Modifications to the initial bd

    • Add a new column for protected users on table users
    -- Add column to user table to include if it's protected or not
    ALTER TABLE users ADD COLUMN protected BOOLEAN DEFAULT False;
    • change id columns on postgresql from int to numeric (because of possible overflow)
    alter table logs alter column id_bot type numeric;
    alter table logs alter column target_id type numeric;
    alter table tweets alter column tweet_id type numeric;
    alter table tweets alter column user_id type numeric;
    alter table users alter column user_id type numeric;
    alter table policies alter column bots type numeric[];  

Neo4j

  • Import CALL apoc.load.json("user_nodes.json")
YIELD value
MERGE (p:User {name: value.a.properties.name, id: value.a.properties.id, username: value.a.properties.username})
CALL apoc.load.json("bots_nodes.json")
YIELD value
MERGE (p:Bot {name: value.a.properties.name, id: value.a.properties.id, username: value.a.properties. username})
CALL apoc.load.json("tweets.json")
YIELD value
MERGE (p:Tweet {id: value.a.properties.id})
CALL apoc.load.json("follow_rel.json")
YIELD value
MATCH(p {id:value.start.properties.id})
MATCH(u {id:value.end.properties.id})
CREATE (p)-[:FOLLOWS]->(u)
CALL apoc.load.json("retweet.json")
YIELD value
MATCH(p {id:value.start.properties.id})
MATCH(u {id:value.end.properties.id})
CREATE (p)-[:RETWEETED]->(u)
CALL apoc.load.json("reply.json")
YIELD value
MATCH(p {id:value.start.properties.id})
MATCH(u {id:value.end.properties.id})
CREATE (p)-[:REPLIED]->(u)
CALL apoc.load.json("wrote.json")
YIELD value
MATCH(p {id:value.start.properties.id})
MATCH(u {id:value.end.properties.id})
CREATE (p)-[:WROTE]->(u)
CALL apoc.load.json("quote.json")
YIELD value
MATCH(p {id:value.start.properties.id})
MATCH(u {id:value.end.properties.id})
CREATE (p)-[:QUOTED]->(u)
  • Export:
call apoc.export.json.query("match (start) - [r:QUOTED] ->(end) return start, r, end", "quote.json")
call apoc.export.json.query("match (start) - [r:WROTE] ->(end) return start, r, end", "write.json")
call apoc.export.json.query("match (start) - [r:RETWEETED] ->(end) return start, r, end", "retweet.json")
call apoc.export.json.query("match (start) - [r:FOLLOWS] ->(end) return start, r, end", "follow_rel.json")
call apoc.export.json.query("match (start) - [r:REPLIED] ->(end) return start, r, end", "reply.json")
call apoc.export.json.query("match (a:Tweet) return a", "tweets.json")
call apoc.export.json.query("match (a:User) return a", "user_nodes.json")
call apoc.export.json.query("match (a:Bot) return a", "bots_nodes.json")
  • Indexation
// create index on user id
CREATE CONSTRAINT user_id
ON (u:User)
ASSERT u.id IS UNIQUE
// create index on tweet id
CREATE CONSTRAINT tweet_id
ON (t:Tweet)
ASSERT t.id IS UNIQUE
// create index on bot id
CREATE CONSTRAINT bot_id
ON (b:Bot)
ASSERT b.id IS UNIQUE
// create index on bot username
CREATE CONSTRAINT bot_username
ON (b:Bot)
ASSERT b.username IS UNIQUE
// create index on user username
CREATE CONSTRAINT user_username
ON (u:User)
ASSERT u.username IS UNIQUE