# Generate Outdated Content URL Reports

This script will search Google for a specific develoment branch of the classic Open Targets Platform (targetvalidation.org), scrape the URLs from the first 100 results, and generate a CSV of URLs that can be submitted to [Google's Remove Outdated Content tool](https://www.google.com/webmasters/tools/removals).

To run the script, you will require:

1. A specific classic Platform branch that you want to remove from Google that has been incorrectly indexed (e.g. ``EFO3``, ``qa``, ``shipit``, etc.)
2. An account with [Zenserp](https://zenserp.com/)* 

*Note: the free account with Zenserp is limited to 50 Google Search API calls per month - if the branch you want to remove from Google has more than 100 results, increase the ``number_of_search_results_to_include`` value.

## Step 1: Find the Platform branches incorrectly indexed by Google

To find specific Platform branches that have been incorrectly indexed by Google, please run the following Google search:

``site:*.targetvalidation.org -www``

This will search for any URLs that include ``*.targetvalidation.org`` but will exclude ``www.targetvalidation.org``. In the search results, pay particular attention if Google incorrectly indexes any of the following branches:

- ``qa.targetvalidation.org``
- ``efo3.targetvalidation.org``
- ``shipit.targetvalidation.org``
- ``master.targetvalidation.org``
- ``dev.targetvalidation.org``

## Step 2: Set parameters for script to run

In [1]:
zenserp_api_key = 'key goes here' 
platform_branch = 'qa.targetvalidation.org'
search_query = 'site:' + platform_branch
number_of_search_results_to_include = 100
search_engine = 'google.co.uk'
search_location = 'United Kingdom'
search_gl = 'GB'

## Step 3: Run Google search using Zenserp API

In [None]:
import requests

headers = {
    'apikey': zenserp_api_key,
}

params = (
    ('q', search_query),
    ('location', search_location),
    ('search_engine', search_engine),
    ('gl', search_gl),
    ('hl', 'en'),
    ('num', number_of_search_results_to_include)
)

response = requests.get('https://app.zenserp.com/api/v2/search', headers=headers, params=params)

search_data = response.json()

print('There are %i results from this query' % (len(search_data['organic'])))

## Step 4: Scrape URLs from search results and generate CSV file

In [None]:
import csv
import datetime

csv_data = []

for result in search_data['organic']:
    csv_data.append(result['url'])

today_date = datetime.datetime.now().strftime('%Y-%m-%d')

report_directory = 'reports/'

report_filename = today_date + '_' + platform_branch + '_' + search_engine + '_report.csv'

with open(report_directory + report_filename, 'w') as myfile:
    wr = csv.writer(myfile, delimiter="\n")
    wr.writerow(csv_data)

## Step 5: Upload CSV file to Google's Remove Outdated Content tool

For small lists with less than 5 URLs to remove, you can manually enter each URL into the [Remove Outdated Content tool](https://www.google.com/webmasters/tools/removals) form.

For larger lists, please download and install the [Bulk Outdated Content Removal Chrome extension](https://github.com/noitcudni/google-webmaster-tools-bulk-outdated-content-removal). Once installed, visit the [Remove Outdated Content tool](https://www.google.com/webmasters/tools/removals) page and upload the CSV you created in step 4 by clicking on the `Upload Your File` button.