# Increasing my connections on LinkedIn

One of my actual projects is to improve my LinkedIn webpage and my connections. </br>
Two actions that can help me with that are, making new connections and producing content for my page. </br>
Today, I am going to do both at the same time. I will create a script that will search for the professors of my University from the Departments from the departments that I am most interested, get their names and emails.

### Always starting with the libraries

In [1]:
#These first two will help me to do the web scrapping
import requests
from bs4 import BeautifulSoup

#This one will help me to organize the data
import pandas as pd

### Getting the data from Areas of study

First we will get the text of the web page with the htmls. After that we will use the BeautifulSoup library that will create an object that will parse the elements, making for us much easier to find out the info that we want.

In [2]:
#The html containing the Capilano areas of study 
html = 'https://www.capilanou.ca/programs--courses/search--select/explore-our-areas-of-study/'

#Requesting the html data
html_request = requests.get(html).text 
#Creating a object BeautifulSoup
soup = BeautifulSoup(html_request, 'lxml')

### Defining the main function
I went to the page in my browser, and I looked up for the html attributes to use for searching the info that I wanted. For this will use the tool inspect in our browser. </br>**Obs**: I realized that we can create one function, that will work for getting the areas available, and to get the departments of this areas. So we can write less code

In [3]:
def getContentBlocks(soup):
    '''
    This function based receives an input of a soup object
    and returns the names of the areas/departments and their links
    '''
    
    # Get all the divs (html element) with this class
    divs_comp = soup.find_all(class_ = "component-cta cta-multi-column component-section")

    #For all the divs that we have found, look for the links
    links = []
    areas = []
    for div in divs_comp:
        results = div.find_all('a')
        if results != None:
            for result in results:
                #If they find the element
                #We will save the addres and the area name
                areas.append(result.text)
                links.append(result)

    links = [link.get('href') for link in links]
    
    return links, areas

### Lets see the areas that we got
So we run the function, with the bsoup object

In [4]:
# The function return the links and the names
areas = getContentBlocks(soup)
#Here we acess the names
areas[1]

['Study arts & sciences',
 'Study Business',
 'Study Education, Health & Human Development',
 'Study fine & applied arts',
 'Study Global & Community Studies']

From these, I am interested only in the first two, so let's select the links of this areas

In [5]:
#Nothe that the names are at the [1], and the links at the [0]
myLinks = areas[0][:2]
myLinks

['/programs--courses/search--select/explore-our-areas-of-study/arts--sciences/',
 '/programs--courses/search--select/explore-our-areas-of-study/business--professional-studies/']

### Acessing the Schools
This function gets the link that we collected and use that to search the new pages

In [6]:
def getLink(link):
    '''
    This function receives an address, and it return the bs
    '''
    start = "https://www.capilanou.ca/"    
    
    html_request = requests.get(start + link).text 
    soup = BeautifulSoup(html_request, 'lxml')
    
    return soup

### Using a for loop to interate between all the areas
Here we will enjoy the functions that we have created to use them in a foor loop, that will repeat the same process for all the areas and departments

In [7]:
links = []
names = []
for link in myLinks:
    #For each link in myLinks, we will access their websites and store the page content
    pageContent = getLink(link)
    #We save the result here
    result = getContentBlocks(pageContent)
    #And we split the data in two lists
    links.extend(result[0],)
    names.extend(result[1]) 

### Let's see the results

I created a data frame to show the results in a better way

In [8]:
data = pd.DataFrame({
    "Links" : links
}, index = names)
data

Unnamed: 0,Links
Study Humanities at CapU,/programs--courses/search--select/explore-our-...
Study STEM at CapU,/programs--courses/search--select/explore-our-...
Study Social Sciences at CapU,/programs--courses/search--select/explore-our-...
Study Business at CapU,/programs--courses/search--select/explore-our-...
Study Communication at CapU,/programs--courses/search--select/explore-our-...
Study Legal Studies at CapU,/programs--courses/search--select/explore-our-...


### Improving the names of the Schools

In [9]:
index = [' '.join(name.split()[1:-2]) for name in names]

data.index = index
data

Unnamed: 0,Links
Humanities,/programs--courses/search--select/explore-our-...
STEM,/programs--courses/search--select/explore-our-...
Social Sciences,/programs--courses/search--select/explore-our-...
Business,/programs--courses/search--select/explore-our-...
Communication,/programs--courses/search--select/explore-our-...
Legal Studies,/programs--courses/search--select/explore-our-...


### Separating the ones that I am interested

I am interested in Social Sciences (Economics), STEM and Business

In [10]:
schoolLinks = links[1:4]
schoolLinks

['/programs--courses/search--select/explore-our-areas-of-study/arts--sciences/school-of-science-technology-engineering--mathematics-stem/',
 '/programs--courses/search--select/explore-our-areas-of-study/arts--sciences/school-of-social-sciences/',
 '/programs--courses/search--select/explore-our-areas-of-study/business--professional-studies/school-of-business/']

## Diving into the Departments

Here we will divide into 2 approaches, because the way that the STEM and Social Sciences web pages are organized differ from the Business website.

### Working with the Arts and Sciences part

The web pages are not exactly te same for both schools, but there is a way for us to generalize

In [11]:
names = []
links = []
for link in schoolLinks[:2]:
    page = getLink(link)
    mainDiv = page.find(id = "tab-courses")
    for address in mainDiv.find_all('a'):
        #Get the text and stores here
        text = address.text
        if text.find("Department") != None:
            #If the text has department in it, we will remove it
            text = text.replace(" Department","")
        names.append(text)
        links.append(address.get('href'))
        
names

['Biology',
 'Chemistry',
 'Computing & Data Science',
 'Engineering',
 'Mathematics & Statistics',
 'Physics',
 'Anthropology',
 'Applied Behaviour Analysis (Autism)',
 'Criminology',
 'Economics',
 'Geography',
 'Political Science',
 'Psychology',
 'Sociology',
 'Women’s & Gender Studies']

### Getting the professors and instructors info

Here we join the departments from the STEM and Social Science and the Business.
</br> We can use the same code for searching in them. </br>

In [12]:
depart_names = names[1:6] + [names[9]] + ['Business']
depart_links = links[1:6] + [links[9]] + [schoolLinks[-1]] #This last one is the business link

In [13]:
prof_name = []
prof_email = []
prof_depart = []
prof_profiles = []
for link in depart_links:
    #This is to save the department name
    index = depart_links.index(link)
    department = depart_names[index]
    #Searching into the webpages
    page = getLink(link)
    div = page.find(id = 'tab-instructors')
    results = div.find_all( class_ = "alert-content")
    profiles = div.find_all( target = "_blank")
    for result in results:
        #Here we save the name, the email, and the department
        prof_name.append(result.h5.text)
        prof_email.append(result.a.get('href').replace('mailto:',""))
        prof_depart.append(department)    
    for profile in profiles:
        #Here we get the page for the professors that have a profile
        prof_profiles.append(profile.get('href'))

### Creating a DataFrame with the Results

In [14]:
data = pd.DataFrame({
    "Name" : prof_name,
    "Email" : prof_email
}, index = prof_depart)
data

Unnamed: 0,Name,Email
Chemistry,Angela Yee,ayee@capilanou.ca
Chemistry,Dan Fediw,dfediw@capilanou.ca
Chemistry,"Mark Vaughan B.Sc. (Hons), PhD",mvaugha2@capilanou.ca
Chemistry,Matt Berry,matthewberry@capilanou.ca
Chemistry,"Matthew Le Page B.Sc. (Hons), PhD",mlepage@capilanou.ca
...,...,...
Business,"Tammy Towill FCPA, FCMA, MBA, MA (in progress)",ttowill@capilanou.ca
Business,"Todd Newfield B.Comm., M.Sc. JBS, JMP",tnewfiel@capilanou.ca
Business,Tracey Chang,traceychang2@capilanou.ca
Business,Victor Law,victorlaw@capilanou.ca


Creating a txt file with the emails. </br>
Then I will just need to copy and paste when writting the email for the professors.

In [15]:
with open('prof_emails', 'w') as file:
    content = ', '.join(list(data.Email.values))
    conteudo = file.write(content)

### Final Steps

Now I will publish this script in my GitHub, then I will publicate a post on my LinkedIn web page explaining briefly about this project. The last step is sending an email for all this professors, talking a little about myself and my study interests, with the link for the LinkedIn publication, and for the GitHub repository with the Python Script that made all this possible.