
<img src=http://static.politifact.com.s3.amazonaws.com/politifact/photos/fake.png width=300>


# "Fake news"

In this last phase of the semester, we will take our skills in analyzing text, in web scraping, in network tracing, in bot mobilizing, and even in machine learning, and apply them to two case studies. The first will be so-called "fake news." Yes, the term has been so over-used as to become meaningless, but there is a circle of ideas around the term that are reasonably well-defined.

Our good friends at FirstDraft have tried to provide some [structure to the term](https://firstdraftnews.com/fake-news-complicated/), laying groundwork for research and for investigations like ours. Robyn Caplan already discussed their breakdown of mis- or disinformation, their matrix of "problematic content."<br><br>
<img src=https://firstdraftnews.com/wp-content/uploads/2017/02/FDN_7Types_Misinfo-01-1024x576.jpg width=500>
<br><br>
(Seems like red is the dominant accent color for fake news critiques.) The horizontal scale describes the intent of the author, their "intent to deceive." FirstDraft breaks this intention down further, categorizing the motivation for creating and circulating content into the eight P's -- "Poor Journalism, Parody, to Provoke or ‘Punk’, Passion, Partisanship, Profit, Political Influence or Power, and Propaganda."

When it comes to identifying fake news or misinformation, FirstDraft advocates a personal practice of skeptical reading. Given the bubbles and echo chambers we live in (remember how recommender systems work, and then there's the fact that we read a lot of news that is shared by our like-minded friends), be on guard for information that might sound too good to you. And independently verify what you are reading. 

In light of all this fakery, fact checking has taken on new importance since the election (heck, in the last 24 hours!). April 2 was the first [International Fact-Checking Day](http://factcheckingday.com/). It was initiated by Poynter and included events like a "fact-check-a-thon," tutorials on how to spot fake news, and a [Hoax-Off](http://factcheckingday.com/hoax-off). Politifact's contribution to the day was [the analysis of a recent fake news story](http://www.politifact.com/california/statements/2017/apr/02/blog-posting/websites-spread-fake-news-nancy-pelosi-was-arreste/) about [Nansy Pelosi being arrested.](http://thelastlineofdefense.org/breaking-nancy-pelosi-was-just-taken-from-her-office-in-handcuffs/).

Fact checking can be a laborious process and there have been attempts to automate it. Here is a nice summary of [the state of fact checking.](https://fullfact.org/media/uploads/full_fact-the_state_of_automated_factchecking_aug_2016.pdf) But it is not clear whether fact-checking alone is able to stop the spread of disinformation. Brooke Borel for FiveThirtyEight writes about [all the forces that counteract simple fact-checking.](https://fivethirtyeight.com/features/fact-checking-wont-save-us-from-fake-news/) And ultimately, as danah boyd points out, [we are at war.](http://www.zephoria.org/thoughts/archives/2017/01/27/the-information-war-has-begun.html)

Borel cites algorithmic recommender systems and other dissemination strategies that make fake news particularly hard to stop. We have seen misinformation spread by networks of bots, [through advertising](https://www.wired.com/2016/12/fake-news-will-go-away-tech-behind-ads-wont-pay/) and through [promotion by the influential (or infamous)](http://web.archive.org/web/20161107234222/https:/twitter.com/GenFlynn/status/794000841518776320). In the second part of our investigation, we will look at how fake news spreads, dusting off our network graphing skills. We have already seen, for example, how [YouTube's recommender system deals in fake news.](https://medium.com/the-graph/youtubes-ai-is-neutral-towards-clicks-but-is-biased-towards-people-and-ideas-3a2f643dea9a)

As a preview, consider a site like [Hoaxy](http://hoaxy.iuni.iu.edu/) that uses lists of fake news content and tracks the spread of URLs on social media (um, Twitter). This is a platform that you could easily make (and make better, I'm sure). There is also a machine learning (AI) [Fake News Challenge](http://www.fakenewschallenge.org/). 

<img src=http://www.fakenewschallenge.org/assets/img/wordcloud_lg.png width=300>

>The goal of the Fake News Challenge is to explore how artificial intelligence technologies, particularly machine learning and natural language processing, might be leveraged to combat the fake news problem. We believe that these AI technologies hold promise for significantly automating parts of the procedure human fact checkers use today to determine if a story is real or a hoax.

Of course this is an incredibly hard problem, one with deep epistemologial roots, and so the challenge has softened slightly to instead automate components that can be used to identify fake news. "Stance detection," testing whether the headline of a story and its contents agree.

We will also look at how large companies like Facebook and Google are addressing the problem. Today, we'll see Google's ClaimReview markup, a technique for spreading fact-checking that introduces new UI into GoogleNews -- to be released Friday! (In a recent post, danah boyd argues that [this kind of effort won't "solve" the problem.](https://backchannel.com/google-and-facebook-cant-just-make-fake-news-disappear-48f4b4e5fbe8) Eli Pariser, of The Filter Bubble fame, has created [an open GoogleDoc to focus attention on possible design solutions.](https://docs.google.com/document/d/1OPghC4ra6QLhaHhW8QvPJRMKGEXT7KaZtG_7s5-UQrw/edit#heading=h.1suoz8sco476) There are great ideas here and its structure is extremely helpful. 

Finally, as a flip to automated detection, we will see how AI or machine learning might start to play a role in how fake news is generated. We are probably well-aware of companies like [Narrative Science](https://www.narrativescience.com/) that turn data into stories. 

>Narrative Science is humanizing data like never before, with technology that interprets your data, then transforms it into Intelligent Narratives at unprecedented speed and scale. With Narrative Science, your data becomes actionable—a powerful asset you can use to make better decisions, improve interactions with customers and empower your employees.

Their work is largely rule-based (think Eliza) but is rapidly incorporating learned elements. More recently, companies like [Wibbitz](http://www.wibbitz.com/) have gone farther, creating more elaborate media elements from data. Or in this case, video from text.

>Video is essential to survive in the digital media landscape. Consumers want engaging video content accessible on every device and advertising budgets are rapidly shifting to video. Wibbitz automatically creates videos from text articles in seconds, allowing our partners to scalably increase premium video inventory.

Is this the future? Mounds of misinformation created automatically, spread automatically? 

## Short fact-checking case study

In honor of the first Fact-Checking Day, Politifact had a look at a fake news story about Nancy Pelosi. Before we consult Google News for the story, let's dust off our skills with AutoComplete and see what comes up.

In [None]:
import requests
import urllib

params = {
    'client': 'firefox',
    'ds':'n',
    'q': "Nancy Pelosi"
}

url_params = urllib.urlencode(params)
url = 'http://suggestqueries.google.com/complete/search?%s' % urllib.urlencode(params)

r = requests.get(url)
data = r.json()
    
search_term = data[0] 
results = data[1]

for result in results:
    print result

Oh, that's too bad. 

If you go to Google News and [conduct a search for 'Nancy Pelosi arrested'](https://www.google.com/search?hl=en&gl=us&tbm=nws&authuser=0&q=nancy+pelosi&oq=nancy+pelosi&gs_l=news-cc.3..43j0l9j43i53.1528.5256.0.6498.18.9.2.7.4.0.147.716.7j2.9.0...0.0...1ac.1.DMePDr5D3RM#hl=en&gl=us&authuser=0&tbm=nws&q=nancy+pelosi+arrested&*) you will find a Politifact blog post (the one we are basing this little detour on). Interestingly, the headlines from places like Snopes don't indicate strongly that the story is fake (although Politifact's rather large "FAKE" PNG for this story does the trick nicely). When it comes to surfacing fact-checks on articles, Google is not "expressive" enough -- it's user interface groups articles and fact-checking articles together in a flat way. I believe this is the motivation for their new markup offering that will help identify fact-checks in an article. 

Today's lesson is on so-called microformats that search engines have popularized to help them better categorize the kind of data they are indexing. [Here is a great Search Engine Land story on them.](http://searchengineland.com/schema-markup-structured-data-seo-opportunities-site-type-231077) Mike will walk you through the history of this effort, but for the moment, trust us that there is an extension to HTML that let's you encode extra information about the text on a web page. They offer a tool to explore the structure of these additions. [Here is a view of the Politifact story](https://search.google.com/structured-data/testing-tool/u/0/#url=http%3A%2F%2Fwww.politifact.com%2Fcalifornia%2Fstatements%2F2017%2Fapr%2F02%2Fblog-posting%2Fwebsites-spread-fake-news-nancy-pelosi-was-arreste%2F)

On the left is the source of the story and on the right is an example of structured data found in the HTML page. The categories on the right indicate that the kind (type) of data is a ClaimReview (Google's idea of a fact-check), and there is an indication of what claim is being checked and by whom. Have a read over the categories. Oh and notice that there's even an error -- big, professional publishers also make mistakes! It is with this extra information that Google will be unveiling a new display mechanism for fact-checks -- we are told, Friday(ish). [Richard Gingras comments:](https://blog.google/topics/journalism-news/labeling-fact-check-articles-google-news/)

>In the seven years since we started labeling types of articles in Google News (e.g., In-Depth, Opinion, Wikipedia), we’ve heard that many readers enjoy having easy access to a diverse range of content types. Earlier this year, we added a “Local Source” Tag to highlight local coverage of major stories. Today, we’re adding another new tag, “Fact check,” to help readers find fact checking in large news stories. You’ll see the tagged articles in the expanded story box on news.google.com and in the Google News & Weather iOS and Android apps, starting with the U.S. and the U.K.

Because microdata, and metadata before it, are important for web scraping, we will go over these now. They are also going to be useful for fact-checking. Our friends at Google suggested that it might be useful to see how an organization indicates in the markup what claim is being checked and compare that to the body text. It might also be useful to see what kinds of facts are being checked. We are told that the following web sites are using Google's ClaimReview microdata markup.

1. Snopes
2. NYT fact-check
3. WaPo fact-check
4. FullFact.org
5. GossipCop
6. Politifact (maybe 80% ish)
7. Factcheck.org (more hit or miss)

With this in mind, how would you find fact-checking articles? What strategy would you follow? The text added to a web page is exactly what you saw on the righthand side of the Google "Structured Data" tool. In particular, you'll find a reference to "ClaimReview" in the source of the HTML page. So with that hint, how would you find fact-checking articles? Give it a try!

### 1. Metadata on the Web

Before we look at the ```ClaimReview``` markup that publishers are starting to put in their pages, we're going to take a quick historical view of metadata on the web. As Mark mentioned, data within web pages (the "metadata") can be a rich source of information. Most of this is used by search engines, Facebook, Twitter, etc - but it's also available for you! Let's take a quick spin through the past ~15 years and look at how metadata on the web has changed. And, we're going to write a little HTML on the way :-0

**Let's travel back in time... to the year 2000...**

Imaging that we are going to start a technology news web site and put it on the world-wide-web, information super-highway thing. We start off by creating a web page that links to some of the big technology stories of the day, like this:

In [None]:
%%HTML

<html>

    <head>
        <title>My Technology News Site</title>
    </head>

    <body>
        <div>
            <div><strong>Steve Jobs introduces the public beta of Mac OS X</strong></div>
            <div>Sept 13, 2000 - Steve Jobs <a href="https://www.apple.com/pr/library/2000/09/13Apple-Releases-Mac-OS-X-Public-Beta.html" target="_blank">introduces</a> the public beta of Mac OS X for US$29.95.</div>
            <div>Author: Michael Young</div>
        </div>
    </body

</html>

---

We send the link to our new site to our family and friends, and we have a handful of people reading it (well, two people really: our mom and the dog).

**So, now we want more people to read it!**

What's the best way to have more people discover this site in 2000? **Search.** And, by that I mean Google (which had launched a few years earlier in 1998). Let's ignore ~~Yahoo!~~ Oath for now, but it was a real player in search in the early days of the web.

Our good friend, the SEO Guru, told us we needed to do some work on our site so that we could move up in the rankings. After talking to the "guru", we learned to help Google by telling them a little about our humble news site using the **```meta```** ```keywords``` and ```description``` tags.

In [None]:
%%HTML

<html>

    <head>
        <title>My Technology News Site</title>
        <meta name="description" content="My Technology News Site has the most interesting technology stories every day.">
        <meta name="keywords" content="tech, news, super important tech news, technology, technology news">
    </head>

    <body>
        <div>
            <div><strong>Steve Jobs introduces the public beta of Mac OS X</strong></div>
            <div>Sept 13, 2000 - Steve Jobs <a href="https://www.apple.com/pr/library/2000/09/13Apple-Releases-Mac-OS-X-Public-Beta.html" target="_blank">introduces</a> the public beta of Mac OS X for US$29.95.</div>
            <div>Author: Michael Young</div>
        </div>
    </body

</html>

---
To review (in case of web technology circa 2000 is feeling slightly remote):

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;The **```keywords```** metadata is used to tell search engines about the topic of the page.

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;The **```description```** metadata is used to describe the site and this is what search engines use in their search results.

A **key point** to remember: none of this data is viewable by users. It's for machines.

Here is example of the ```description``` tag in use: 

![description](https://qph.ec.quoracdn.net/main-qimg-2c6dddd356b26ca0763241db501f52f8)

**Great!**

Google now knows a little about us and is ranking our site a bit higher in the search results for "technology news." Because of that, we have a few more people showing up at our site.

Over the next few years, we expand our little tech news site in to listing events as well. 

In [None]:
%%HTML

<html>

    <head>
        <title>My Technology News Site - Events in San Francisco</title>
        <meta name="description" content="My Technology News Site has the most interesting technology stories and events.">
        <meta name="keywords" content="tech, news, super important tech news, technology, technology news, technology events, events, San Francico, Silicon Valley">
    </head>

    <body>
        <div>
            <div><strong>Macworld Expo San Francisco</strong></div>
            <div>January 5-9, 2004</div>
            <div>Moscone Convention Center, San Francisco, CA</div>
        </div>
    </body

</html>

---

**Questions**: how does a search engine know that this site/page is really about technology? How does it know it's an event listing? Does anyone see how this metadata approach could be abused?


### 2. Microformats

Around 2005, a group of people came up with the notion of a "Microformat." The idea was to use additional markup in HTML to allow machines to easily discover data inside HTML (like our calendar event or news story). Simply put, _Microformats are a way to use html pages as both a human readable document and machine readable data, without repetition._

The idea was originally a grassroots movement from developers but it was soon supported by some search engines and browers. It was never part of a standards body though - just an "informal" specification. [Microformats](http://microformats.org/wiki/Main_Page) are still used and supported but as we'll see, new metadata formats came along...

Microformats allowed developers to highlight specific elements/types of content within a page, such as:

```
hAtom - blog posts and other date-stamped content
hCalendar - events
hCard - people, organizations, contacts
hListing - listings for products or services
hMedia - media info about images, video, audio
hProduct - products
hRecipe - cooking+baking recipes
hResume - individual resumes and CVs
hReview - individual reviews and ratings
hReview-aggregate - aggregate reviews and ratings
adr - address location information
geo - latitude & longitude location (WGS84 geographic coordinates)
```

**What good does this do? What can we do with Micoformats?**

A few things:
1. Search engines now have help in knowing what a page, or data within a page, is about.
2. Search engines can use this markup to know what to show in something like a "rich snippet."
3. Browsers started adding the ability to do things like detect an event in a page and allow a user to add it to their calendar (or a person's information to their Address book).

Make sense? Let's revisit our event listing by using the `hCalendar` microformat.

In [None]:
%%HTML

<html>

    <head>
        <title>My Technology News Site - Events in San Francisco</title>
        <meta name="description" content="My Technology News Site has the most interesting technology stories and events.">
        <meta name="keywords" content="tech, news, super important tech news, technology, technology news, technology events, events, San Francico, Silicon Valley">
    </head>

    <body>
        <div class="vevent">
            <div class="summary"><strong>Macworld Expo San Francisco</strong></div>
            <div>
                 <span class="dtstart" title="2004-01-05">January 5</span>-
                 <span class="dtend" title="2005-01-09">9, 2004</span>
            </div>
            <div class="location">Moscone Convention Center, San Francisco, CA</div>
        </div>
    </body>

</html>

---

### 3. Enter Microdata (and others)

Over the coming years, other metadata approaches emerged such as [Microdata](http://schema.org/) (Google and other search engines), [OpenGraph](http://ogp.me/) (Facebook), [TwitterCards](https://dev.twitter.com/cards/overview) (Twitter) and others (RDF, [RDFa](https://rdfa.info/)). These were created and driven by various standard bodies, commercial interests (publishers, social networks, search engines, browsers) and developers. Again, the goal of these were to make it easier for machines to make sense of the data published inside web pages and to use that data to help display, rank and make publisher's content easier to interact with.

We're going to focus on [Microdata](https://www.w3.org/TR/microdata/) for the rest of the class, but it's worth looking in to the others as well.

Similar to Microformats, Microdata is defined as: _This specification defines the HTML microdata mechanism. This mechanism allows machine-readable data to be embedded in HTML documents in an easy-to-write manner, with an unambiguous parsing model. It is compatible with numerous other data formats including RDF and JSON._

Though `Microdata` is not an official spec of *The W3C* (_The W3C HTML Working Group failed to find an editor for the specification and terminated its development with a 'Note'._) it is supported by Google, Microsoft, Yahoo and Yandex. In fact, these companies came together to create a vocabulary (specification, essentially) around microdata that is published at http://schema.org/. These companies have tried to establish and open forum and community-based process for updating the vocabulary/specification.

Let's look at an example of how microdata works. We'll start by looking at a **Movie**. Here is some simple HTML that display's information about the movie Avatar. Go ahead and run it.

In [None]:
%%HTML

<div>
    <h1>Avatar</h1>
    <div>Director: James Cameron (born August 16, 1954)</div>
    <div>Science Fiction</div>
    <div><a href="../movies/avatar-theatrical-trailer.html">Trailer</a></div>
</div>

---

### Adding Microdata to our HTML

We want to let Google and the search engines know what this is information about a movie.

**Step 1**: Identify which section is about the Movie 🎥

Add the **`itemscope`** attribute to the HTML element which encloses the information about the movie.

```html
<div itemscope>
    ...Movie info here...
</div>
```


In [None]:
%%HTML

<div itemscope>
    <h1>Avatar</h1>
    <div>Director: James Cameron (born August 16, 1954)</div>
    <div>Science fiction</div>
    <div><a href="../movies/avatar-theatrical-trailer.html">Trailer</a></div>
</div>

---

**Step 2**: Specify the type (i.e. this thing is a Movie)

Now, add the **`itemtype`** attribute right after the **`itemscope`** and specify the type. When specifying the type, you can use any of the types listed on [schema.org](http://schema.org/docs/full.html)

```html
<div itemscope itemtype="http://schema.org/Movie">
    ...Movie info here...
</div>
```

In [None]:
%%HTML

<div itemscope itemtype="http://schema.org/Movie">
    <h1>Avatar</h1>
    <div>Director: James Cameron (born August 16, 1954)</div>
    <div>Science fiction</div>
    <div><a href="../movies/avatar-theatrical-trailer.html">Trailer</a></div>
</div>

---

**Step 3**: Use the **`itemprop`** attribute to specify properties about the Movie.

Nothing has changed visually on the page, but we've told search engines that this section of the page is about a Movie. Google thanks you! Can we go further with the [Movie type](http://schema.org/Movie)? How would we tell Google which of these fields is the movie name, directory and genre? We can do this using the **`itemprop`** attribute.


In [None]:
%%HTML

<div itemscope itemtype="http://schema.org/Movie">
    <h1 itemprop="name">Avatar</h1>
    <div>Director: James Cameron (born August 16, 1954)</div>
    <div>Science fiction</div>
    <div><a href="../movies/avatar-theatrical-trailer.html">Trailer</a></div>
</div>

### You Try It

Edit the HTML above and specify which fields are the directory, genre and trailor. Reference the [Movie documenation on schema.org.](http://schema.org/Movie)


In [None]:
%%HTML



### HOLD THE PHONE!

So, this is one of three flavors of microdata. The others being [RDFa](https://rdfa.info/) and [JSON-LD](http://json-ld.org/spec/latest/json-ld/) (where "LD" is "linked data"). 

The search engines support all three formats but Google [recently said](https://developers.google.com/search/docs/guides/intro-structured-data) JSON-LD is their recommended format.

Some publishers use the Microdata HTML markup, some use JSON-LD. It's a bit of the wild west out there. For the examples in this notebook, we'll stick with the HTML markup. However, if you were to take our Movie example from above and express it as JSON-LD, it would look something like this:

```json
<script type="application/ld+json">
{
  "@context": "http://schema.org",
  "@type": "Movie",
  "name": "Avatar",
  "genre": "Science Fiction",
  "director": {
    "@type": "Person",
    "name": "James Camerom",
  },
}
</script>

```



### Let's look at a different type: LocalBusiness 🏢

Schema.org specifies a variety of different types: movies, restaurants, animal shelters, news articles, aiports, vehicles...you name it. Let's look at the **```LocalBusiness```** and **```Restaurant```** types:

Here is the FourSquare page for a restaurant called Olmstead: https://foursquare.com/v/olmsted/5744efc0498e8b9ffbd0682a

Let's see if FourSquare is using microdata to highlight the restaurant information. Open the page in your browser and "View the Source." Do you see the ```itemscope``` attribute anywhere? What type did they specify?

Looking through HTML can be very messy! Let's use Google's testing tool to see what it finds in the HTML. Give this a try:

https://search.google.com/structured-data/testing-tool/u/0/#url=https%3A%2F%2Ffoursquare.com%2Fv%2Folmsted%2F5744efc0498e8b9ffbd0682a

Ahh....much better!

Now, try it for the Zagat page for the same restaurant: https://search.google.com/structured-data/testing-tool/u/0/#url=https%3A%2F%2Fwww.zagat.com%2Fr%2Folmsted-brooklyn


### Now that you see how it works, let's do it ourselves!

**A little refresh here**...remember how we scrape a page? Using the requests module to do an `HTTP GET` request. 

In [None]:
import requests

# grab the FourSquare page for Olmsted
url = 'https://foursquare.com/v/olmsted/5744efc0498e8b9ffbd0682a'

r = requests.get(url)
print r.text

Now that we have the HTML of that page, let's use our old friend `BeautifulSoup` to help parse out the **`itemscope`** attributes to see what microdata is represented in this HTML page.

In [None]:
from bs4 import BeautifulSoup

soup = BeautifulSoup(r.text)

for tag in soup.find_all(attrs={"itemscope":True}):
    print "Name:", tag.name
    print "Type:", tag['itemtype']

How about if we wrap this up in a handy function where we pass in a URL and get back the list of schema types (if any) are found on the given page.

In [None]:
import requests
from bs4 import BeautifulSoup

def get_schemas(url):

    # make the request and run the page through BeautifulSoup
    r = requests.get(url)
    soup = BeautifulSoup(r.text)
    
    schemas = []
    for tag in soup.find_all(attrs={"itemscope":True}):
        schemas.append(tag['itemtype'])

    return schemas

# call our method with any URL
# news article?
#url = 'https://www.nytimes.com/2017/04/02/us/politics/trump-china-jared-kushner.html'
#url = 'http://digg.com/2017/amazon-alexa-is-not-your-friend'

# restaurant? local business?
#url = 'https://foursquare.com/v/olmsted/5744efc0498e8b9ffbd0682a'
#url = 'https://www.yelp.com/biz/olmsted-brooklyn'

# recipe?
#url = 'http://www.bonappetit.com/recipe/bas-best-molten-chocolate-cake'

# movie?
url = 'http://www.imdb.com/title/tt3521164/'

schemas = get_schemas(url)
for schema in schemas:
    print schema


### Let's looks at the `NewsArticle` type 📰

Take a quick peek at the schema documeation before we get started: http://schema.org/NewsArticle

How would we go about extracting information about a NewsArticle using the tools we know (requests, BeautifulSoup, etc)?

In particular, can we find this information in a [news article]('https://www.nytimes.com/2017/04/02/us/politics/trump-china-jared-kushner.html')?
* `headline`
* `author`
* `description`

In [None]:
import requests
from bs4 import BeautifulSoup

url = 'https://www.nytimes.com/2017/04/02/us/politics/trump-china-jared-kushner.html'

# make the request and run the page through BeautifulSoup
r = requests.get(url)
soup = BeautifulSoup(r.text)

for tag in soup.find_all(attrs={'itemtype': 'http://schema.org/NewsArticle'}):

    for stuff in tag.find_all(attrs={"itemprop":True}):

        if 'headline' in stuff['itemprop'] == 'headline':
            print 'headline: ' + stuff.get_text()+"\n"

        elif 'author' in stuff['itemprop']:
            print 'author: ' + stuff.get_text()+"\n"

        elif stuff['itemprop'] == 'description':
            print 'description: ' + stuff['content']+"\n"
  

### ClaimReview

Ok, let's finally take a look at the *`ClaimReview`* microdata specification: https://schema.org/ClaimReview

As Mark mentioned earlier, `ClaimReview` is a new type addded to the list of supported types (on schema.org) by [Google on October 16](https://blog.google/topics/journalism-news/labeling-fact-check-articles-google-news/).

A `ClaimReview` is defined as: _A fact-checking review of claims made (or reported) in some creative work (referenced via itemReviewed)._

Let's look at an article from the Washington Post "Fact Checker" blog to see if they have a ClaimReview. Use our method `get_schemas()` from above to print out the microdata types:

In [None]:
url = 'https://www.washingtonpost.com/news/fact-checker/wp/2016/06/17/fact-checking-three-democratic-claims-on-assault-rifles-and-guns/?utm_term=.fdbafa4d21c9'

for schema in get_schemas(url):
    print schema

Now, let's take a closer look at the info in the `ClaimReview`. In this example, we start to print out the item properties for this wonderful piece from snopes. 🐟

In [None]:
import requests
from bs4 import BeautifulSoup

# ummm...what?
url = 'http://www.snopes.com/bumble-bee-tuna-recall-human/'

# make the request and run the page through BeautifulSoup
r = requests.get(url)
soup = BeautifulSoup(r.text)

for tag in soup.find_all(attrs={'itemtype': 'http://schema.org/ClaimReview'}):

    for stuff in tag.find_all(attrs={'itemprop':True}):
        print "Property:", stuff['itemprop']
        print "Text:", stuff.get_text()


To compare, take a look at what Google's tool finds for this article: https://search.google.com/structured-data/testing-tool/u/0/#url=http%3A%2F%2Fwww.snopes.com%2Fbumble-bee-tuna-recall-human%2F

One important piece of data in here is the `reviewRating` --> `alternateName` value. In Snopes case, this is where they tell you if the Claim is `False`, `Mostly False`, or `True`. What can you do with this information? 

How might you use what we've just done to find how fake news is shared on social networks?

### P.S. if you were to add most of the various forms of metadata to our "technology news site" example...

...it might look something like this:

In [None]:
%%HTML

<html>

    <head>
        <title>My Technology News Site - Events in San Francisco</title>
        <meta name="description" content="My Technology News Site has the most interesting technology stories and events.">
        <meta name="keywords" content="tech, news, super important tech news, technology, technology news, technology events, events, San Francico, Silicon Valley">
        
        <!-- OpenGraph for FB -->
        <meta property="og:title" content="My Technology News Site - Events in San Francisco" />
        <meta properly="og:description" content="My Technology News Site has the most interesting technology stories and events." />
        <meta property="og:type" content="website" />
        <meta property="og:image" content="http://mysweettechsite.com/logo.png" />

        <!-- TwitterCard for Twitter -->
        
        <meta name="twitter:card" content="summary" />
        <meta name="twitter:site" content="@mytwitteraccount" />
        <meta name="twitter:title" content="My Technology News Site - Events in San Francisco" />
        <meta name="twitter:image" content="http://mysweettechsite.com/logo.png" />

        <script type="application/ld+json">
        {
            "@context": "http://schema.org",
            "@type": "Event",
            "location": {
                "@type": "Place",
                "address": {
                  "@type": "PostalAddress",
                  "addressLocality": "San Francisco",
                  "addressRegion": "CA",
                },
                "name": "The Moscone Convention Center"
            },
            "name": "Macworld Expo San Francisco",
            "startDate": "2014-01-05T09:00",
            "endDate": "2014-01-09T17:00"
        }
        </script>
    
    </head>

    <body>
        <div class="vevent">
            <div class="summary"><strong>Macworld Expo San Francisco</strong></div>
            <div>
                 <span class="dtstart" title="2004-01-05">January 5</span>-
                 <span class="dtend" title="2005-01-09">9, 2004</span>
            </div>
            <div class="location">Moscone Convention Center, San Francisco, CA</div>
        </div>
    </body>

</html>