[Chrome started blocking](https://security.googleblog.com/2019/10/no-more-mixed-messages-about-https_3.html) "mixed content" earlier this year, meaning that https:// pages can only load secure https:// subresources.

But don't panic! This Python script can help you!  🙌

What it does:

*   Loops through all your URLs
*   Spots "Mixed Content" errors from the Chrome Console Log
*   Exports the results in a CSV file!

Kudos to [@allophonousrex](https://twitter.com/allophonousrex) for the original script, which can be found [here](https://github.com/DeepCrawlSEO/public/blob/master/Chrome%20Mixed%20Content%20Errors%20Fetch%20v1.2.ipynb). The current script adds a few tweaks, such as:

*   Installs Selenium + Chromium in the Cloud, so it works out of the box in Google Colab
*   Auto-imports CSV file via Widget
*   Auto-adjusts fetching speed 

Caveat: The script *should* work well on large files, though it has yet to be tested on a list larger than 10k URLs.

Enjoy!

### Install Selenium + Chromium

In [None]:
!apt-get update
!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin
!pip install selenium
from selenium import webdriver

# set options to be headless:
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
wd = webdriver.Chrome('chromedriver',options=options)

## Import libraries

In [None]:
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
import pandas as pd
import time
from requests import get

## Upload input CSV file

The CSV file should:

*   have one colum called 'url' + the URLs to be reviewed
*   be called 'url_list.csv'

In [None]:
from google.colab import files
uploaded = files.upload()
for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))


Saving url_list.csv to url_list.csv
User uploaded file "url_list.csv" with length 597 bytes


In [None]:
url_source = '/content/url_list.csv'

## Create an ouput file
This CSV file will be called 'my_urls_mixed_content_errors.csv' and will host the results.

In [None]:
%%writefile my_urls_mixed_content_errors.csv
-

Writing my_urls_mixed_content_errors.csv


In [None]:
url_output = '/content/my_urls_mixed_content_errors.csv'
df_output = pd.DataFrame(columns = ['url', 'severe_count', 'warning_count'])
df_output.to_csv(url_output, index=False)

## Adjust fetching speed

To err on the side of caution the script saves 8 rows of urls at a time to the output file. You can make this higher for large CSVs, although the script has not been tested for CSVs > 10K URLs


In [None]:
rows_per_run = 9 #@param {type:"slider", min:1, max:15, step:1}
print(rows_per_run)

8


## Iterate through URLs, spot + export mixed content errors

In [None]:
#@markdown ##### Double-click to check the underlying code.
#@markdown ---

## Enables browser logging & sets options
d = DesiredCapabilities.CHROME
d['loggingPrefs'] = { 'browser':'ALL' }

opt = webdriver.ChromeOptions()
opt.add_experimental_option('w3c', False)

df_source = pd.read_csv(url_source)
while len(df_source) > 0:
    new_rows = df_source.iloc[ 0: rows_per_run, : ]
    print(str(len(new_rows)) + ' rows to process...')
    url_list = new_rows['url'].tolist()
    
    console_output_df = pd.DataFrame()
    
    d = DesiredCapabilities.CHROME
    d['loggingPrefs'] = { 'browser':'ALL' }
    opt = webdriver.ChromeOptions()
    opt.add_experimental_option('w3c', False)

    for url in url_list:
        #driver = webdriver.Chrome(chrome_path, options=opt,desired_capabilities=d)
        wd = webdriver.Chrome('chromedriver',options=options)
        
        try:
            wd.get(url)
            console = wd.get_log('browser')
            severe_count = 0
            warning_count = 0
            for log in console:
                if "Mixed Content" in log['message'] and "SEVERE" in log['level']:
                    severe_count += 1
                if "Mixed Content" in log['message'] and "WARNING" in log['level']:
                    warning_count += 1

            console_results = {'severe_count':severe_count, 'warning_count':warning_count, 'loaded':True}
            console_row_df = pd.DataFrame(data=console_results, index=[0])
            console_row_df['url'] = url
            console_output_df = console_output_df.append(console_row_df, ignore_index=True, sort=False)
            wd.quit()
    
        ## A failsafe to prevent URLs that won't load from blocking script from continuing
        ## There may be a more elegant solution for this
        except:
            console_results = {'severe_count':'', 'warning_count':'', 'loaded':False}
            console_row_df = pd.DataFrame(data=console_results, index=[0])
            console_row_df['url'] = url
            console_output_df = console_output_df.append(console_row_df, ignore_index=True, sort=False)
            wd.quit()
            print("Skipping 1 URL that failed to render.")
        
    # Read the output CSV, write the new rows, then write the output back again
    df_output = pd.read_csv(url_output)
    df_output = df_output.append(console_output_df, ignore_index=True, sort=False)
    df_output.to_csv(url_output, index=False)
    
    # If all the URLs were processed, write the source list back without the processed URLs
    updated_df = df_source.iloc[ rows_per_run+1: , : ]
    updated_df.to_csv(url_source, index=False)
    df_source = pd.read_csv(url_source)
    
wd.quit()

print("Done!")

Voilà! The results are in /content/my_urls_mixed_content_errors.csv!