# Week 2

# Text Collection and Preprocessing
You should collect and preprocess some textual data to investigate what is `GISMA` and how people perceive this brand on social media. In particular, you should do the following:
- Request and receive the [About GISMA](https://www.gisma.com/school/about-us) web page using [Requests](https://requests.readthedocs.io/en/latest/).
- Extract and clean up the main content of this web page using [Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/).
- Create a free [Twitter developer account](http://apps.twitter.com/) to get access to the Twitter API.
- Build a client application to work with the Twitter API using [Tweepy](https://docs.tweepy.org/en/stable/).
- Search for the latest tweets about `GISMA` and clean up their content.
- Considering these collected textual data from two different sources (GISMA's website and Twitter), what can you say about this brand?

In [None]:
import requests

In [None]:
r = requests.get('https://www.gisma.com/why-gisma', auth=('user', 'pass'))

In [None]:
r.status_code

200

In [None]:
r.headers['content-type']

'text/html; charset=utf-8'

In [None]:
r.encoding

'utf-8'

In [None]:
r.text

'<!DOCTYPE html><html lang="en"><head><meta charSet="utf-8"/><meta name="viewport" content="width=device-width"/><title>About us / Why Gisma - Gisma</title><meta name="description" content="About Us - Gisma University of Applied Sciences, which is located in Germany. Gisma offers Bachelor&#x27;s and Master&#x27;s Degree Courses in Germany with focus on helping students become exceptional leaders in their own professions."/><meta name="robots" content="index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1"/><link rel="canonical" href="https://www.gisma.com/why-gisma"/><meta property="og:locale" content="en_US"/><meta property="og:type" content="article"/><meta property="og:title" content="About us / Why Gisma - Gisma"/><meta property="og:description" content="About Us - Gisma University of Applied Sciences, which is located in Germany. Gisma offers Bachelor&#x27;s and Master&#x27;s Degree Courses in Germany with focus on helping students become exceptional leaders

In [None]:
r.json()

## Using [Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)

In [None]:
from bs4 import BeautifulSoup
soup = BeautifulSoup(r.text, 'html.parser')

In [None]:
print(soup.prettify())

<!DOCTYPE html>
<html lang="en">
 <head>
  <meta charset="utf-8"/>
  <meta content="width=device-width" name="viewport"/>
  <title>
   About us / Why Gisma - Gisma
  </title>
  <meta content="About Us - Gisma University of Applied Sciences, which is located in Germany. Gisma offers Bachelor's and Master's Degree Courses in Germany with focus on helping students become exceptional leaders in their own professions." name="description"/>
  <meta content="index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1" name="robots"/>
  <link href="https://www.gisma.com/why-gisma" rel="canonical"/>
  <meta content="en_US" property="og:locale"/>
  <meta content="article" property="og:type"/>
  <meta content="About us / Why Gisma - Gisma" property="og:title"/>
  <meta content="About Us - Gisma University of Applied Sciences, which is located in Germany. Gisma offers Bachelor's and Master's Degree Courses in Germany with focus on helping students become exceptional leaders in the

In [None]:
soup.title

<title>About us / Why Gisma - Gisma</title>

In [None]:
soup.title.name

'title'

In [None]:
soup.title.string

'About us / Why Gisma - Gisma'

In [None]:
soup.title.parent.name

'head'

In [None]:
soup.p

<p class="navigation-menu__sub-description">Discover more about Gisma University of Applied Sciences.</p>

In [None]:
soup.p['class']

['navigation-menu__sub-description']

In [None]:
soup.a

<a class="b1htnswt is-style-level-1" href="https://apply.gisma.com/login" level="1"><span>Apply now</span></a>

In [None]:
soup.find_all('a')

[<a class="b1htnswt is-style-level-1" href="https://apply.gisma.com/login" level="1"><span>Apply now</span></a>,
 <a class="b1htnswt is-style-level-2" href="/enquire-now" level="2"><span>Enquire Now</span></a>,
 <a aria-current="page" class="navigation-menu__sub-link" data-active="" data-radix-collection-item="" href="/why-gisma">About us / Why Gisma</a>,
 <a class="navigation-menu__sub-link" data-radix-collection-item="" href="/faculty-and-team">Faculty And Team</a>,
 <a class="navigation-menu__sub-link" data-radix-collection-item="" href="/life-at-gisma">Life at Gisma</a>,
 <a class="navigation-menu__sub-link" data-radix-collection-item="" href="/life-at-gisma/career-centre">Career Centre</a>,
 <a class="navigation-menu__sub-link" data-radix-collection-item="" href="/life-at-gisma/student-experience">Student Experience</a>,
 <a class="navigation-menu__sub-link" data-radix-collection-item="" href="/life-at-gisma/frequently-asked-questions">Frequently Asked Questions</a>,
 <a class="na

In [None]:
soup.find(id="link3")

### Cleaning up the text and Extracting the Main content

In [None]:
soup.smooth()

In [None]:
soup.get_text()

'About us / Why Gisma - GismaExplore GismaApply nowEnquire NowSCHOOLAbout us / Why GismaDiscover more about Gisma University of Applied Sciences.Faculty And TeamMeet our expert and experienced teaching and administrative staff.Life at GismaLearn about Gisma’s accreditations, expert faculty, and career support.Career CentreDelve into Gisma’s specialised career and employability services.Student ExperienceExplore Gisma’s visa advice, accommodation, and financial aids.Frequently Asked QuestionsFind answers to your queries and concerns.Tuition Fees and FundingExplore the array of funding options and incentives available for our programmesCorporate RelationsUncover Gisma’s strategic partnerships and talent development options.How to applyExplore our application requirements, discover how to apply online and get started todayPROGRAMMESUndergraduatePostgraduateProgrammesLOCATIONPotsdamExperience our premier, state-of-the-art campus, nestled by a lake in the serene landscapes of Brandenburg.Be

## Build a Client application to work with the Twitter API using [Tweepy](https://docs.tweepy.org/en/stable/install.html)

In [None]:
import tweepy

In [None]:
auth = tweepy.OAuth2BearerHandler("AAAAAAAAAAAAAAAAAAAAAI0qtwEAAAAAZr7CalsvYsKBC7qKCHQ096BMB%2Fs%3D8cTUIjcSCnE00NKT3HictpeY4E7buWU9ye9MKfbyLBAN6jxnmc")
api = tweepy.API(auth)

In [None]:
client = tweepy.Client("AAAAAAAAAAAAAAAAAAAAAI0qtwEAAAAAZr7CalsvYsKBC7qKCHQ096BMB%2Fs%3D8cTUIjcSCnE00NKT3HictpeY4E7buWU9ye9MKfbyLBAN6jxnmc")

In [None]:
user = api.get_user(screen_name="Chrisjjm_13")

Forbidden: 403 Forbidden
453 - You currently have access to a subset of Twitter API v2 endpoints and limited v1.1 endpoints (e.g. media post, oauth) only. If you need access to this endpoint, you may need a different access level. You can learn more here: https://developer.twitter.com/en/portal/product