# Data Arts Module 2: Social Network Evolution

In this module, we are going to be writing code that allows us to visualize the progression/evolution of a mock social network over time. Our idea is quite simple. we generate a randomized list of users and friend/connections between users. We then utilize Python libraries to plot the users as nodes on a graph, and the connections as edges on a graph. 

In order to create this visualization, we are going to need a basic understanding of some important programming concepts. These concepts are Object Oriented Programming and List/Dictionary Comprehensions.

## What is Object Oriented Programming?

Object oriented programming, or OOP, is a way to think about data in a more tangible way. It lets us organize descriptions of objects into what we call a `class`. For example, let's say I want to write code that describes a dog. I would define a new `class` for all dogs, and the `class` would represent my idea of what it means to be a dog.

In [6]:
class Dog:
    # This is where I define what it means to be a dog.
    scientific_name = 'canis lupus familiaris'
    home_planet = 'Earth'

Cool! But now we have a slight problem. The Dog `class` needs to describe *every* dog ever, not just some of them. That means we can't specify some important things — like species, or weight, or fur color — because then the Dog `class` wouldn't really define what it means to be a dog. Instead, it would just define what it means to be some particular kind of dog.

The solution is called inheritance. We can define a more specific `class` that copies over all the information from the Dog `class`, but it also lets us go into more detail.

In [7]:
class Golden_Retriever(Dog):
    # Notice the word `Dog` in parentheses in the line above.
    # That says the Golden_Retriever class should copy the info inside the Dog class.
    species = 'golden retriever'
    fur_color = 'gold'
    
class Pekingese(Dog):
    # Notice the word `Dog` in parentheses in the line above.
    # That says the Pekingese class should copy the info inside the Dog class.
    species = 'pekingese'
    fur_color = 'white'

Now we have two more classes, which are both subclasses of the Dog `class`. That means they can access the information from the Dog `class`, but they can also go into further detail than the Dog `class` can.

In [8]:
Snoopy = Dog() # This is how we make a new dog.
Sammy = Golden_Retriever() # This is how we make a new golden retriever.
Lucy = Pekingese() # This is how we make a new pekingese.

print("Snoopy's scientific name:", Snoopy.scientific_name)
print("Sammy's scientific name:", Sammy.scientific_name)
print("Lucy's scientific name:", Lucy.scientific_name)

Snoopy's scientific name: canis lupus familiaris
Sammy's scientific name: canis lupus familiaris
Lucy's scientific name: canis lupus familiaris


In the cell above, notice how all three dogs could access the attribute `scientific_name` from the Dog `class`. Snoopy could access it because he literally is a Dog. Meanwhile, Sammy and Lucy could access it because the Golden_Retriever `class` and the Pekingese `class` both inherit from the Dog `class`.

Sammy and Lucy can also access a `fur_color` attribute, which is specific to the Golden_Retriever `class` and the Pekingese `class`. However, notice that the next cell causes an error because Snoopy is a Dog — not a Golden_Retriever or a Pekingese — and the Dog `class` has no `fur_color` attribute.

In [9]:
print("Sammy's fur color:", Sammy.fur_color)
print("Lucy's fur color:", Lucy.fur_color)
print("Snoopy's fur color:", Snoopy.fur_color)

Sammy's fur color: gold
Lucy's fur color: white


AttributeError: 'Dog' object has no attribute 'fur_color'

Okay, but what about things that are even more specific? What about things that are different for every golden retriever? Like its weight or name, for example? Now we have to define a special function inside the Golden_Retriever `class`. It's called the `__init__` function, short for 'initialize', and it gets executed immediately after we make any new golden retriever. The first parameter is always `self`, which is bound to the new golden retriever that we're creating. The other parameters can be whatever specific variables you want. We'll pick `name` and `weight` to start off.

In [11]:
class Golden_Retriever(Dog):
    def __init__(self, name, weight):
        # Remember, `self` is bound to the new golden retreiver being created.
        self.name = name # Give `self` an attribute `name`, bound to the value of the parameter `name`.
        self.weight = weight # Give `self` an attribute `weight`, bound to the value of the parameter `weight`.

In [12]:
Sammy = Golden_Retriever('Sammy', 30)
print("Sammy's name is:", Sammy.name)
print("Sammy's weight is:", Sammy.weight)

Sammy's name is: Sammy
Sammy's weight is: 30


Note that those specific parameters — `name` and `weight` — are defined only for specific golden retrievers. Those parameters don't exist for the larger Golden_Retriever `class` itself. The next cell causes an error because the Golden_Retriever `class` has no `name` parameter. Only specific golden retrievers, like Sammy, have that.

In [13]:
print(Golden_Retriever.name)

AttributeError: type object 'Golden_Retriever' has no attribute 'name'

This was a brief introduction to the concepts of Object Oriented Programming. We will explain how this relates to our visualization in a later part of this module. Next, we will be going over the concept of basic list/dictionary comprehensions.

## What are Lists?

Lists are a way to conveniently store data. In Python, we use brackets to denote lists. For example look at the next cell, where we define an empty list.

In [14]:
empty = []

Lists can store basically any value. In the next cell, you can see a list that contains the following items:
<ul>
<li>\* A floating point number.</li>
<li>\* A string.</li>
<li>\* A `lambda` function.</li>
<li>\* Another list!</li>
</ul>

In [15]:
my_list = [3.14, 'computer', lambda x: x, [7]]

## What are Dictionaries?

Dictionaries are a bit like lists, because they also let as store a lot of different values. They're a bit different, though, because dictionaries sort of "translate" an input element to an output element. For example, the dictionary in the next cell translates `'computer'` to `'computadora'`.

In [16]:
english_to_spanish = {'computer': 'computadora'}
print("'computer' translates to:", english_to_spanish['computer'])

'computer' translates to: computadora


Like lists, dictionaries can also store all kinds of values. The next cell has a dictionary that translates the following items:
<ul>
<li>\* An integer ——> A string.</li>
<li>\* A `lambda` function ——> An empty list.</li>
<li>\* `None` ——> Another dictionary!</li>
</ul>

In [17]:
my_dictionary = {5: 'solar', (lambda x: x) : [], None: {7: 17}}

## List and Dictionary Comprehensions

Sometimes we have a bunch of values already in a list or some other iterable, and we want to extract a *new* list from it. For example, perhaps I have a list of arbitrarily chosen numbers and I want to extract a new list that contains all the same numbers, but multiplied by `10`.

In [18]:
lst = [4, 8, 15, 16, 23, 42]
# This doesn't work. It gives me a list 10 times as long!
new = lst * 10
print(new)

[4, 8, 15, 16, 23, 42, 4, 8, 15, 16, 23, 42, 4, 8, 15, 16, 23, 42, 4, 8, 15, 16, 23, 42, 4, 8, 15, 16, 23, 42, 4, 8, 15, 16, 23, 42, 4, 8, 15, 16, 23, 42, 4, 8, 15, 16, 23, 42, 4, 8, 15, 16, 23, 42, 4, 8, 15, 16, 23, 42]


Writing `lst * 10` doesn't work, because that just gives us a list that's 10 times as long as the one we want. The proper solution is a called a *list comprehension*. In the example below, `x` gets bound to each element in `lst`, one at a time. Then we add `x*10` to `new`, for each value that `x` takes on. First `x` gets assigned to `4`, and we add `4*10` to `new`. Then `x` gets assigned to `8`, and we add `8*10` to `new`. And so on.

In [19]:
lst = [4, 8, 15, 16, 23, 42]
# This works. It's a list comprehension.
new = [x * 10 for x in lst]
print(new)

[40, 80, 150, 160, 230, 420]


We can also do dictionary comprehensions. It's just like a list comprehension, but we're making a dictionary instead. Like before, `x` gets bound to each element in `lst`, one at a time. But now we add a `(key: value)` pair to `new`, instead of just an element. First `x` gets assigned to `4`, so we add the mapping `(4*10: 4+4)` to `new`. Then `x` gets assigned to `8`, so we add the mapping `(8*10: 8+4)` to `new`. And so on.

In [20]:
lst = [4, 8, 15, 16, 23, 42]
new = {x * 10: x + 4 for x in lst}
print(new)

{80: 12, 160: 20, 420: 46, 150: 19, 230: 27, 40: 8}


Lists are a way to conveniently store data. In Python, we use brackets to denote lists. For example look at the next cell, where we define an empty list.

That was a basic introduction to lists and list comprehensions in Python. Now, let's see how we are going to use these tools to create our visualization.

# Network Visualization

In [21]:
#This cell contains important imports for our visualizations

# These are imports that will help us draw the visualization
import matplotlib.pyplot as plt
import ipywidgets as widgets
%matplotlib inline


# These are general utility imports
import csv
import copy
import networkx as nx
from random import random


size = plt.rcParams["figure.figsize"] #This size is to be changed later, depending on Greg's preference 


In [22]:
"""This is the class architecture for the visualization. We have a Network class which represents the overall network, and a 
person class which acts as a 'template' for each person in the visualization. Similar to how the Dog class in the examples
acted as a template for dogs, the Person class takes in information unique for each person, but still maintains the same
overall form from person to person"""

class Network(nx.DiGraph):
    """Network class."""
    def __init__(self):
        nx.DiGraph.__init__(self)  #We are initializing the network as a DiGraph, which is a directed graph
        self.population = {} #The population in our network is initially nothing/empty, since we haven't added people yet!

    def add_node(self, person):
        nx.DiGraph.add_node(self, person) #We are using the library's add_node function, which adds a node to the graph with the
                                          #specific properties of the person that was passed into the function
        self.population[person.id] = person #We are adding the recently added person into our population dictionary


class Person:
    """Person class."""
    empty = set()

    def __init__(self, id, conn1, conn2, conn3, start, duration, end):  
        #This method initializes a person based on the parameters passed in, namely an ID and a list of connections (friends)
        #Additionally, each person is given a start and end time when their friendships begin and end.
        self.id = id
        self.pos = (random()*size[0], random()*size[1])
        self.potential_connections = set((conn1, conn2, conn3))
        self.start = start
        self.duration = duration
        self.end = end

    def connections(self, step): 
        #This function returns the connections that a person has if the network is in the right duration period
        if self.start <= step < self.duration:
            return self.potential_connections
        return Person.empty

In [23]:
"""This method reads each row in the csv file with the node information, creates a Person object for each row's information, 
and then adds that person to the network visualization"""

def init_network():
    network = Network() #This lines create an empty network
    with open('network.csv') as file: 
        #The following block opens the network csv file, and makes a new person for each row in the csv file 
        #Each row in the file contains information for one person. Open the CSV file if you are confused about the format.
        reader = csv.reader(file, delimiter=',')
        labels = next(reader)
        for row in reader:
            row = [int(param) for param in row]
            person = Person(*row)
            network.add_node(person) #Add the newly created person to the network
    return network


blank_network = init_network()

In [26]:
#This function draws the specific network depending on which step/time frame the network is in
#The list comprehensions in this block are somewhat complicated, but see if you can understand the general structure

def get_frame(step):
    network = copy.deepcopy(blank_network)
    for person in network.nodes():
        for other in person.connections(step):
            network.add_edge(person, network.population[other]) #The above block draws all the connections for each person
    pos = {person: person.pos for person in network.nodes()}
    node_connections = [len(person.connections(step)) for person in network.nodes()]
    plt.figure(3,figsize=(16,16)) 
    nx.draw(network, pos, node_size=8, cmap=plt.get_cmap('jet'),
            node_color=node_connections, width=0.4)
    plt.show()

In [27]:
#This line adds the slider to the graph (using Python widgets)

_ = widgets.interact(get_frame, step=widgets.IntSlider(min=0, max=300, value=0))

Play around with the network by stepping it forward and backwards through time, and see how the different connections evolve! A few notes on the graph
    1. You may have a red box above your graph that looks like an error message, but just ignore it
    2. Colors represent the amount of connections that a certain node has. Blue indicates 1, and Red indicates multiple
    3. At any step, right click on the visualization and you will have the option to save the image as a photo file.

As you can see, we've created an awesome prototype of basic social network evolution. Review the lab and see how the programming methods explained at the top of the module were used in creating the visualization. See if you can play around and create your own network visualizations with different parameters!