# Cleaning Quiz: Udacity's Course Catalog
It's your turn! Udacity's [course catalog page](https://www.udacity.com/courses/all) has changed since the last video was filmed. One notable change is the introduction of  _schools_.

In this activity, you're going to perform similar actions with BeautifulSoup to extract the following information from each course listing on the page:
1. The course name - e.g. "Data Analyst"
2. The school the course belongs to - e.g. "School of Data Science"

**Note: All solution notebooks can be found by clicking on the Jupyter icon on the top left of this workspace.**

### Step 1: Get text from Udacity's course catalog web page
You can use the `requests` library to do this.

Outputting all the javascript, CSS, and text may overload the space available to load this notebook, so we omit a print statement here.

In [36]:
# import statements
import requests
from bs4 import BeautifulSoup
import re

In [37]:
# fetch web page
r = requests.get('https://www.udacity.com/courses/all')
print(r.text)

<!DOCTYPE html><html lang="en-US"><head><meta charSet="UTF-8"/><script type="text/javascript">window.NREUM||(NREUM={}),__nr_require=function(t,n,e){function r(e){if(!n[e]){var o=n[e]={exports:{}};t[e][0].call(o.exports,function(n){var o=t[e][1][n];return r(o||n)},o,o.exports)}return n[e].exports}if("function"==typeof __nr_require)return __nr_require;for(var o=0;o<e.length;o++)r(e[o]);return r}({1:[function(t,n,e){function r(t){try{s.console&&console.log(t)}catch(n){}}var o,i=t("ee"),a=t(15),s={};try{o=localStorage.getItem("__nr_flags").split(","),console&&"function"==typeof console.log&&(s.console=!0,o.indexOf("dev")!==-1&&(s.dev=!0),o.indexOf("nr_dev")!==-1&&(s.nrDev=!0))}catch(c){}s.nrDev&&i.on("internal-error",function(t){r(t.stack)}),s.dev&&i.on("fn-err",function(t,n,e){r(e.stack)}),s.dev&&(r("NR AGENT IN DEVELOPMENT MODE"),r("flags: "+a(s,function(t,n){return t}).join(", ")))},{}],2:[function(t,n,e){function r(t,n,e,r,s){try{p?p-=1:o(s||new UncaughtException(t,n,e),!0)}catch(f){tr

### Step 2: Use BeautifulSoup to remove HTML tags
Use `"lxml"` rather than `"html5lib"`.

Again, printing this entire result may overload the space available to load this notebook, so we omit a print statement here.

In [38]:
soup = BeautifulSoup(r.text, 'lxml')

In [108]:
print(soup.get_text())

Explore our Programs and Courses | Udacity CatalogProgramsCareersFor EnterpriseFor GovernmentSign InGet StartedProgramsCareersFor EnterpriseFor GovernmentSign InGet StartedProgram CatalogSearchFilter bySelect Program DetailsTypeProgramsFree CoursesSkill LevelBeginnerIntermediateAdvancedEstimated Duration<1 Month1 - 3 Months3+ MonthsIndustry SkillsSkillsAndroidAndroid DevelopmentAndroid StudioArtificial IntelligenceAWSBootstrappingBusiness StrategyC++Career AdvancementCloudFormationCNNsComputer VisionContinuous DeliveryContinuous IntegrationControlControl FlowCore DataCryptographyCSSData AnalysisData ModelingData ScienceData StructuresData VisualizationData WranglingDebuggingDeep LearningDockerExcel & SpreadsheetsFirebaseFlaskFunctionsGame DevelopmentGoogle AnalyticsHTMLImage ClassificationInterview practiceiOSJavaJavaScriptKotlinKubernetesLocalizationMachine LearningMatplotlibMemory ManagementMicroservicesMongoDBMySQLNatural Language ProcessingNavigationNetworkingNeural NetworksObject 

### Step 3: Find all course summaries
Use the BeautifulSoup's `find_all` method to select based on tag type and class name. Just like in the video, you can right click on the item, and click "Inspect" to view its html on a web page.

In [109]:
#soup.title
#soup.head
soup.body

<body><div id="__next"><div class="page-us"><div class=""><div class="us_mobileMenuWrap__3PkUm"><div><a class="catalog-nav-mobile_logoLink__363Sr" href="/" title="Udacity"><svg height="30" viewbox="0 0 180 30" width="180"><g fill="none" fill-rule="evenodd"><path d="M57.6 17.239187c0 2.546698-2.085517 4.407747-4.965517 4.407747-2.88 0-4.965517-1.86105-4.965517-4.407747V7.835994H45.68276v9.501143c0 3.330298 2.88 6.170846 6.951723 6.170846 4.071724 0 6.951724-2.938498 6.951724-6.170846V7.835994H57.6v9.403193zm16.286897-9.403193h-5.36276v15.67199h5.36276c4.468965 0 7.547586-3.03645 7.547586-7.835996 0-4.701596-3.07862-7.835994-7.547586-7.835994zm-.19862 13.71299h-3.177932V9.794994h3.17793c3.376553 0 5.76 2.252847 5.76 5.876994 0 3.917998-2.482757 5.779046-5.76 5.876996zm43.49793.29385c-3.575172 0-5.95862-2.644648-5.95862-6.170846 0-3.526197 2.482758-6.072895 5.95862-6.072895 2.78069 0 4.468966 1.5672 4.468966 1.5672l.794483-1.3713s-1.787587-1.86105-5.46207-1.86105c-4.766896 0-7.746206 3.52

In [99]:
soup.find_all('li')

[<li><button class="catalog-nav-mobile_exploreButton__37pG0"><span class="catalog-nav-mobile_exploreLabel__1rHoH">Programs</span><div class="catalog-nav-mobile_exploreIcon__22qRc"><svg height="32" viewbox="0 0 32 32" width="32"><path d="M22 15.89c0 .319-.146.632-.414.82l-10.175 7.13a.868.868 0 0 1-.499.16c-.503 0-.912-.438-.912-.98V8.98c0-.186.049-.367.141-.524.27-.457.834-.593 1.26-.304l10.174 6.91c.11.075.205.174.276.292.1.165.149.35.149.533v.002z" fill="#ffffff" fill-rule="evenodd"></path></svg></div></button></li>,
 <li><button class="catalog-nav-mobile_exploreButton__37pG0"><span class="catalog-nav-mobile_exploreLabel__1rHoH">Careers</span><div class="catalog-nav-mobile_exploreIcon__22qRc"><svg height="32" viewbox="0 0 32 32" width="32"><path d="M22 15.89c0 .319-.146.632-.414.82l-10.175 7.13a.868.868 0 0 1-.499.16c-.503 0-.912-.438-.912-.98V8.98c0-.186.049-.367.141-.524.27-.457.834-.593 1.26-.304l10.174 6.91c.11.075.205.174.276.292.1.165.149.35.149.533v.002z" fill="#ffffff" fill-r

In [89]:
soup.find_all('a')

[<a class="catalog-nav-mobile_logoLink__363Sr" href="/" title="Udacity"><svg height="30" viewbox="0 0 180 30" width="180"><g fill="none" fill-rule="evenodd"><path d="M57.6 17.239187c0 2.546698-2.085517 4.407747-4.965517 4.407747-2.88 0-4.965517-1.86105-4.965517-4.407747V7.835994H45.68276v9.501143c0 3.330298 2.88 6.170846 6.951723 6.170846 4.071724 0 6.951724-2.938498 6.951724-6.170846V7.835994H57.6v9.403193zm16.286897-9.403193h-5.36276v15.67199h5.36276c4.468965 0 7.547586-3.03645 7.547586-7.835996 0-4.701596-3.07862-7.835994-7.547586-7.835994zm-.19862 13.71299h-3.177932V9.794994h3.17793c3.376553 0 5.76 2.252847 5.76 5.876994 0 3.917998-2.482757 5.779046-5.76 5.876996zm43.49793.29385c-3.575172 0-5.95862-2.644648-5.95862-6.170846 0-3.526197 2.482758-6.072895 5.95862-6.072895 2.78069 0 4.468966 1.5672 4.468966 1.5672l.794483-1.3713s-1.787587-1.86105-5.46207-1.86105c-4.766896 0-7.746206 3.526198-7.746206 7.738045 0 4.309797 3.07862 7.933945 7.944827 7.933945 3.773793 0 5.66069-2.5467 5.660

In [107]:
# Find all course summaries
summaries = soup.find_all('li', {'class':'card-list_catalogCardListItem__aUQtx'}) #nav_catalogNavLink__30IaD nav_subSectionLink__CNYvm
summaries

[]

In [91]:
#summaries = soup.find_all('a')
print(summaries[0])
print('Number of Courses:', len(summaries))

<a class="nav_catalogNavLink__30IaD nav_subSectionLink__CNYvm" href="/courses/ai-for-business-leaders--nd054">AI for Business Leaders</a>
Number of Courses: 96


In [96]:
# Extract course title (fetching CSS content)
summaries[0].select_one['h2'].get_text().strip()

TypeError: 'method' object is not subscriptable

In [None]:
# Extract school names
summaries[0].select_one['h3'].get_text().strip()

### Step 4: Inspect the first summary to find selectors for the course name and school
Tip: `.prettify()` is a super helpful method BeautifulSoup provides to output html in a nicely indented form! Make sure to use `print()` to ensure whitespace is displayed properly.

In [None]:
# print the first summary in summaries


Look for selectors that contain the courses title and school name text you want to extract. Then, use the `select_one` method on the summary object to pull out the html with those selectors. Afterwards, don't forget to do some extra cleaning to isolate the names (get rid of unnecessary html), as you saw in the last video.

In [None]:
# Extract course title


In [None]:
# Extract school


### Step 5: Collect names and schools of ALL course listings
Reuse your code from the previous step, but now in a loop to extract the name and school from every course summary in `summaries`!

In [None]:
courses = []
for summary in summaries:
    # append name and school of each summary to courses list


In [None]:
# display results
print(len(courses), "course summaries found. Sample:")
courses[:20]