# Tobis Twitter Network

This was a project about Tobis Twitter Network. The question we asked ourself was: "Which people (accounts) are influencing Tobis Feed the most?". To answer the question we used methods from the Social Network Analysis. Our first approach was to collect all friends (people which Tobi is following) and check which of his friends has overlapping friends. The second approach was to just create a network with all friends and friends of friends (2nd grade friends) from Tobi and calculate the centrallity betweenness of each node.

## Crawling
To crawling twitter we used the [tweepy library](https://www.tweepy.org/). To use this library it is necessary to get an API-key from [Twitter Developer](https://developer.twitter.com/en). Our API-keys are stored in `config.py`, there is a template in the repository.

In [16]:
import string # for printable comparison
import time # time thread-timeout
import datetime # get current time
import config # for apikeys
import tweepy # for crawling Twitter

First we must authenticate us with our API-keys:

In [17]:
# Authenticate Tweepy, a Python Library for crawling Twitter via the Twitter API
# Get Apikey from here: https://developer.twitter.com/en
auth = tweepy.AppAuthHandler(
    config.APIKEY,
    config.SECRET_APIKEY)
api = tweepy.API(auth)

## Get data from Twitter API
Now we want to access Tobis Twitter-ID (Should be `3490529422`):

In [18]:
# Get my ID
myName = 'tobiashoelzer'
myID = api.get_user(myName).id # get_user returns a huge User Object with name, id, etc.

myID

3490529422

Wow! An ID! How awesome! Now we crawl Tobis friends, theire IDs and Names:

In [19]:
# Get my follows (friends)
myFriendsIDs = api.friends_ids(myID) # Returns a list with the IDs of max. 100 friends
myFriendsNames = {} # Dict where each friend ID is mapped to his name

# F is for Friends who do stuff together. U is for You and me. N is for Anywhere and anytime at all. Down here in the deep blue sea! - so 'f_id' means 'friend_id'
for f_id in myFriendsIDs:
    friend = api.get_user(f_id)
    f_name = ''.join(s for s in friend.name if s in string.printable) # Cleanup non-printable chars
    myFriendsNames[f_id] = f_name

# Name of the latest added friend of Tobi
myFriendsNames[myFriendsIDs[0]]

'Diana Ivanova'

After getting a list with the IDs from Tobis friends, we can go deeper! We crawl all friends of friends from Tobi and store them in a dict. This may take some time since Twitters API doesn't allow bigger API-calls like this, so there is at least one 15 min timeout after getting a RateLimitError. Take a cup of tea or do some sport while this executes. :slightly_smiling_face: :muscle:

In [15]:
# Get 2. Grade Follows (Friends)
secondGradeFriends = {}
print(f'Start crawling {len(myFriendsIDs)} Friends...')
for f_id in myFriendsIDs:
    try:
        # Get friends of current friend (of Tobi) and stores them in secondGradeFriends dict
        f_friends = api.friends_ids(f_id)
        secondGradeFriends[f_id] = f_friends
    except tweepy.RateLimitError:
        # Prevents crashing if the 300 API-Call Limit from Twitters API caused an exeption
        # Tries to continue in 15 Minutes again.
        sleep_time = 15 * 60 + 1
        now = datetime.datetime.now()
        tend = now + datetime.timedelta(0, sleep_time)
        print(f'Crawled already {len(secondGradeFriends)}!')
        print(f'Current Time: {now.strftime("%H:%M:%S")}')
        print(f'Sleep for 15 Minutes (until {tend.strftime("%H:%M:%S")}) to avoid RateLimitErrors. ')
        time.sleep(sleep_time)
        f_friends = api.friends_ids(f_id)
        secondGradeFriends[f_id] = f_friends

NameError: name 'myFriendsIDs' is not defined

## Transforming data
After waiting for twitter to hand over the data we are now able to create an edge list and a node list. The first approach is counting all overlapping friends of Tobis friends. E.g. Tobi has three friends: Ole, Christopher and Philipp. Ole and Christopher are both following (befriended with) Barack Obama and Elon Musk. So there would be an edge `Ole <--2--> Christopher`. Ole and Philipp are both following Michael Reeves, Alexandria Ocasio-Cortez and Greta Thunberg. This edge would be `Ole <--3--> Philipp`. All edges are after the counting saved to a file called `my-edgy-friends.edges` and all nodes are saved to `my-nody-friends.nodes`:

### String.join
The String.join (e.g. ','.join) joins a list to a string seperated by the string. E.g. :
```python
mySeperatorString = '; '
mySeperatorString.join(['Apple', 'Bee', 'Cat'])

# or

'; '.join(['Apple', 'Bee', 'Cat'])
```
Returns:
```
Apple; Bee; Cat
```

In [44]:
# Reform data to fit into an edge list

# Opens edge file with write access and write a head row
f = open("my-edgy-friends.edges", "w") # This time 'f' stands for 'file'
f.write(','.join(['User ID', 'User ID', 'Number of overlapping friends (weight)']) + '\n')

# Array which stores already calculated combinations
matched = []

# I know, there is for sure a better way to do this, but its late and I want to go home. :)
# Double iteration of myFriends, dont try to understand
for f_id_i in myFriendsIDs:
    f_friends_i = secondGradeFriends[f_id_i]

    for f_id_j in myFriendsIDs:
        # If i and j are same users or i and j in combination was already calculated continue with next one
        if f_id_j == f_id_i or f'{f_id_j}-{f_id_i}' in matched:
            continue

        f_friends_j = secondGradeFriends[f_id_j]

        # Number which counts the amount of overlapping friends
        same_friends = 0

        # Iterate through friends of i and j to count overlapping friends
        for f_f_id_i in f_friends_i:
            for f_f_id_j in f_friends_j:
                if f_f_id_i == f_f_id_j:
                    same_friends += 1
        
        # If overlapping friends exists write them to the edge list [id of i, id of h, number of overlapping friends]
        if same_friends > 0:
            f.write(','.join([str(myFriendsNames[f_id_i]), str(myFriendsNames[f_id_j]), str(same_friends)]) + '\n')
        
        # Store the combination, so it doesn't calculated twice
        matched.append(f'{f_id_i}-{f_id_j}')

# Close file for os-security
f.close()

In [38]:
# Get Names from friends
f = open("my-nody-friends.nodes", "w")
f.write(','.join(['User ID', 'name']) + '\n')
for f_id in myFriendsIDs:
    f.write(','.join([str(f_id), str(myFriendsNames[f_id])]) + '\n')

# Close file for os-security
f.close()

Create Nodelist with 20 Friends...


The second approach was to use a simple directed network. So here each edge represents a directed relationship between a 1st grade friend of Tobi and a 2nd grade friend of Tobi. E.g. Tobi has two friends (follows two accounts): Günni and Peter. Günni has two more friends, Sarah and Eli. Peter has also a friend: Erik. So there would be the edges: `Günni --> Sarah`, `Günni --> Eli` and `Peter --> Erik`. The edge list is saved to the file `my-edgy-friend-network.edges`:

In [26]:
f = open("my-edgy-friend-network.edges", "w")
f.write(','.join(['User ID', 'User ID']) + '\n')

for f_id in secondGradeFriends:
    for f_f_id in secondGradeFriends[f_id]:
        f.write(','.join([str(f_id), str(f_f_id)]) + '\n')

f.close()