![Skritter](Skritter-python-selenium-bs4-blank.jpg)

# Skritter Vocabulary Lists


## Introduction 

[Skritter](https://skritter.com/) is a very sophisticated piece of software which can be used to learn both Chinese and Japanese. It is software which uses spaced repetition allowing one to learn Chinese and Japanese characters at the stroke level. It is more advanced than say Anki, but it requires a subscription to use it. Skritter has a number of useful vocabulary lists on their website including many popular textbooks. Many of these lists are available elsewhere on the internet, but some can be difficult to locate. It is possible to export these lists from Skritter, but this is cumbersome if you want more than one list.

This notebook is a relatively simple notebook written to practice web scraping using [Selenium](https://selenium-python.readthedocs.io/). It will log in to your Skritter account and scrape vocabulary lists. In order to get these lists you will need to sign up for an account with Skritter. The purpose of this script is to make life a little easier for Chinese and Japanese language learners. This notebook will scrape vocabulary lists into csv files and produce a description of each list in a markdown file. However, there is no audio scraped or character stroke information from Skritter to accompany these lists. The format of the csv files makes it easy to use with other software such as Anki. The format for the columns is
    1. Simplified Chinese
    2. Traditional Chinese if different from simplified Chinese, otherwise it is just a dash (-)
    3. Pinyin 
    4. English 
    5. Tag (Section)

Certainly, one could of accomplished the same task by copying, pasting and editing or using the export features available on Skritter to reach the same outcome, but by automating this task using Selenium will hopefully make life a whole lot easier for you, especially if you decide you want more than one vocabulary list. 


I wrote this to obtain a few of the Chinese lists, but this notebook will also get Japanese lists too, if desired. However, please note that if you are getting Japanese lists there are still 5 columns in the csv files
    1. Writing = Kanji/Hiragana/Katakana
    2. Writing enabled = Who knows? This is still scraped and put into the csv files. Easily removed if desired.
    3. Reading = Hiragana/Katakana
    4. Definition = English 
    5. Tag (Section)
 
The functions are rather simple and could possibly be improved upon. Also note that there is very simple error handling included here. A good webscraper should of course try to anticipate as many errors possible. 

Skritter provides a valuable service and if you use their materials you should have a subscription and download only the lists you intend to study. This script was intended to automate and simplify what you can already do on their website.


### NOTES:
I have used Selenium with the Chrome browser. You will need to:
- Install [Chrome](https://www.google.com/chrome/)
- Check the version of Chrome, then get the corresponding *chromedriver* from [here](https://sites.google.com/a/chromium.org/chromedriver/downloads).
- Unzip *chromedriver*
- Put it somewhere sensible. Since I am using a Mac I have moved it to `/usr/local/bin` by 
```bash
mv chromedriver /usr/local/bin
```

- It is **NOT** recommended that you try to download all lists for a particular language all at once. This can take a very long time and may over burden their server. Rather you should only select a few at a time.


 
### TODO:
- Improve exception handling. At the moment only minimal exeception handling has been included. 


## The code

In [1]:
from selenium import webdriver # Web scraping
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC 
from selenium.common.exceptions import TimeoutException

from bs4 import BeautifulSoup # Web scraping
import os # path handling
import time # force a wait period
import csv # writing a csv file
import numpy as np

In [2]:
delay = 60 # In seconds

In [3]:
local_Skr_dir = '/Users/lappy/Skritter'

In [4]:
# Homepage 
baseURL = 'https://skritter.com/'
# Login page 
login_url = "https://skritter.com/login"
# Browse lists page (need to be logged in)
browse_lists_url = 'https://skritter.com/vocablists/browse'

In [5]:
driver = webdriver.Chrome() # you may need to pass your executable_path as an argument

In [6]:
# Login using Selenium
def skr_login(my_username,my_password):
    
    # Load login page
    driver.get(login_url)

    # Wait for login buttion to be clickable 
    loginBtn = WebDriverWait(driver,delay).until(EC.element_to_be_clickable((By.CLASS_NAME,'btn')))
    
    # Now log in
    driver.find_element_by_id("login-username").send_keys(my_username)
    driver.find_element_by_id("login-password").send_keys(my_password)
    loginBtn.click()


### Put your login details here

In [7]:
#skr_login("yourUserName","yourPassword")

In [8]:
def write_lst(section_name,SC,TC,PY,ENG,lst_directory):
    """
    This function writes a vocabulary list to a csv file.
    
    section_name: string = Name of section of the vocabulary list.
    SC: array of strings = Simplified Chinese.
    TC: array of strings = Traditional Chinese.
    PY: array of strings = Pinyin.
    ENG: array of strings = English.
    
    Notes:
    ------
    
    This was originally written for Chinese, but will also work for Japanese lists.
    
    Please note that if you are getting Japanese lists there are
    still four columns 
        1. Writing = Kanji/Hiragana/Katakana
        2. Writing enabled = Who knows?
        3. Reading = Hiragana/Katakana
        4. Definition = English 
    
    
    """
    
    # Tags are used by Anki
    tag = section_name.replace(' ','-').replace('/','-')
    
    # File name
    fName = tag + '.csv'
    
    if SC != []:
    
        with open(os.path.join(lst_directory,fName), mode='w') as csv_file:
        
            csv_writer = csv.writer(csv_file, delimiter=',')
            
            numOfEntries = min([len(SC),len(TC),len(PY),len(ENG)]) 
        
            for i in range(numOfEntries):
                
                csv_writer.writerow([SC[i], TC[i], PY[i], ENG[i],tag])
                
        'Written section ' + section_name
    else:
        print('No vocabulary passed')
    
    return 

In [9]:
def get_section(driver,section_url):
    """
    This function gets the vocabulary list at the url section_url and
    returns 4 arrays of strings, 
        - SC = Simplified Chinese,
        - TC = Traditional Chinese,
        - PY = Pinyin, and
        - ENG = English.
    
    driver: Selenium web driver which has already logged into Skritter.
    section_url: string = url of the section of the vocabulary list to download.
    
    Notes:
    ------
    
    This was originally written for Chinese, but will also work for Japanese lists.
    
    Please note that if you are getting Japanese lists there are
    still four columns 
        1. Writing = Kanji/Hiragana/Katakana
        2. Writing enabled = Who knows?
        3. Reading = Hiragana/Katakana
        4. Definition = English 
    
    """
    
    # Load section page
    driver.get(section_url)
    
    # Wait for Chinese list to appear. If Chinese is there, then so is the rest
    chContainer = WebDriverWait(driver,delay).until(
        EC.presence_of_element_located((By.CLASS_NAME, 'vocab-writing')))
        
    # Create a beautiful soup
    bs = BeautifulSoup(driver.page_source,'html')
    
    # Get Chinese (two columns, simplified and traditional(usually just a dash))
    # Will also work for Japanese
    chinese_container = bs.find_all('div',{'class':'vocab-writing'})

    # Extract simplified Chinese 
    # Will also work for Japanese, but this column is Kanji/Hiragana/Katakana.
    simplified_container = chinese_container[::2]

    simplied_chinese = []

    for sc in simplified_container:
        simplied_chinese.append(sc.text.strip())

    # Extract traditional Chinese
    # Will also work for Japanese, but the other column is the writing enabled column.
    tradition_chinese = []    

    traditional_container = chinese_container[1::2]

    for tc in traditional_container:
        tradition_chinese.append(tc.text.strip())
        
    # Get pinyin
    # Will also work for Japanese, but this column is the Reading = Hiragana/Katakana.
    pinyin_container = bs.find_all('div',{'class':'vocab-reading'})

    pinyin = []

    for py in pinyin_container:
        pinyin.append(py.text.strip())
        
    # Get English definition
    english_container = bs.find_all('div',{'class':'vocab-definition'})

    english = []

    for eng in english_container:
        english.append(eng.text.strip())
    
    SC = simplied_chinese
    TC = tradition_chinese
    PY = pinyin
    ENG = english
    
    return [SC,TC,PY,ENG]

In [10]:
def write_markdown_description(lst_NAME,lst_STATS,lst_DESCRIPTION,section_names,lst_directory):
    """
    Creates a markdown file describing the vocabulary list.
    
    lst_NAME: string = Name of the list.
    lst_STATS: string = Creator and word count.
    lst_DESCRIPTION: string = Description of vocabulary list.
    section_names: array of strings = Names of each section of the vocabulary list.
    lst_directory: string = Path to list directory.
    """
    
    # Write lesson summary in a markdown file
    f = open(os.path.join(lst_directory,lst_NAME.replace(' ','-').replace('/','-')+'-Summary.md'),'w+')
    
    f.write('**NAME:** ' + lst_NAME + '\n')
    f.write('\n')
    f.write('**STATS:** ' + lst_STATS + '\n')
    f.write('\n')
    f.write('**DESCRIPTION:** ')
    f.write('\n\n')
    f.write(lst_DESCRIPTION)
    f.write('\n\n')
    f.write('**SECTIONS:** ')
    f.write('\n\n')
    for sn in section_names:
        f.write('\t- ' + sn + '\n')
    
    f.close()

In [11]:
def get_vocab_list_details(driver,lst_url,lang_directory):
    """
    Get the data of the vocabulary list at lst_url and place it in the directory lang_directory.
    
    lst_url: string = url of the vocabulary list you want.
    lang_directory: string = path to the language directory.
    
    Notes:
    ------
    
    This was originally written for Chinese, but will also work for Japanese lists.
    
    Please note that if you are getting Japanese lists there are
    still four columns 
        1. Writing = Kanji/Hiragana/Katakana
        2. Writing enabled = Who knows?
        3. Reading = Hiragana/Katakana
        4. Definition = English 
        
    """
    
    # Create a log file for errors, we are expecting possible TimeoutExceptions
    error_log = open(os.path.join(lang_directory,'error_log.txt'), mode = 'a+')
    
    # We will retry a maximum of 10 times
    total_number_of_tries = 10
    
    error_count = 0
    
    # Load the vocabulary list page    
    driver.get(lst_url)
    
    # Wait for list info to load
    lstName = WebDriverWait(driver,delay).until(
        EC.presence_of_element_located((By.CLASS_NAME,'list-name')))
        
    # Create a beautiful soup 
    bs = BeautifulSoup(driver.page_source,'html')

    # Get the name of the vocabulary list
    lst_NAME = bs.find('div',{'class':'list-name'}).text.strip()
    print('NAME: ',lst_NAME)

    # Get stats
    lst_STATS = bs.find('div',{'class':'list-stats'}).text.strip()
    print('STATS: ',lst_STATS)
    
    # Get description 
    lst_DESCRIPTION = bs.find('div',{'class':'list-description'}).text.strip()
    print('DESCRIPTION: \n\n',lst_DESCRIPTION)
    print('\n')
    
    # Get section information    
    section = bs.find_all('div',{'class':'section-link'})

    section_names = []
    section_urls = []

    print('SECTIONS: \n\n')
        
    # Loop over all the sections in the list
    for s in section:

        # Extract section name 
        section_name = s.span.text.strip()

        # Append section name 
        section_names.append(section_name)

        # Extract section url 
        section_url = baseURL + s['href']

        # Append section url 
        section_urls.append(section_url)

        print(section_name,', ',section_url)
    
    print('\n')
    print('Detected ' + str(len(section_names)) + ' sections.\n')
    
    # Directory for this vocabulary list
    lst_directory = os.path.join(lang_directory,lst_NAME.replace(' ','-').replace('/','-'))

    # Create list directory if doesn't already exist
    if not os.path.exists(lst_directory):
        os.mkdir(lst_directory)
        print("Directory: " , lst_directory ,  " created.")
    else:    
        print("Directory: " , lst_directory ,  " already exists. List will be placed here.")   
        
    # Write a markdown file describing the vocabulary list    
    write_markdown_description(lst_NAME,lst_STATS,lst_DESCRIPTION,section_names,lst_directory)
    
    # Get each section's word list and write it to file
    for i,section_url in enumerate(section_urls):
        # Please note that if you are getting Japanese lists there are
        # still four columns 
        # 1. Writing = Kanji/Hiragana/Katakana
        # 2. Writing enabled = Who knows?
        # 3. Reading = Hiragana/Katakana
        # 4. Definition = English 
        
        
        for j in range(0,total_number_of_tries):
            
            try:
        

                [SC,TC,PY,ENG] = get_section(driver,section_url)

                # Write this sections word list to a csv file
                write_lst(section_names[i],SC,TC,PY,ENG,lst_directory)
                
                break
                
            except TimeoutException as e:
                
                error_count = j + 1
                
                if error_count < total_number_of_tries:
                    
                    # We will try again, but first lets give their server a short break
                    
                    # Load something else
                    driver.get('https://www.google.com/')
                    
                    # Wait 
                    time.sleep(delay)
                    
                else:
                    error_log.write('TimeoutException Thrown and caught multiple times\n')
                    error_log.write('\n')
                    error_log.write('Error on section: ' + section_names[i])
                    error_log.write('\n')
                    error_log.write(section_url)
                    error_log.write('\n')
                    error_log.write('error_count: ' + str(error_count))
                    error_log.write('\n')
                    error_log.write('Maximum number of retries reached. You should try to download this section again!')
                    error_log.write('\n')
                    error_log.write('******************************************************')
                    error_log.write('\n')

    error_log.close()
    
        

In [12]:
def get_all_vocab_lists_data_for_lang(driver,lang_directory):
    
    """
    Get the name and url of each vocabulary list.
    
    bs: beautiful soup object = bs object of the browse list page on Skritter.
    lang_directory: string = path to language directory
    """
    
    # Create a log file for errors, we are expecting possible TimeoutExceptions
    error_log = open(os.path.join(lang_directory,'error_log.txt'), mode = 'a+')
    
    # We will retry a maximum of 10 times
    total_number_of_tries = 10
    
    error_count = 0
    
    # Arrays to store all the urls and names for each list
    lst_urls = []
    lst_names = []
    
    # Create a beautiful soup
    bs = BeautifulSoup(driver.page_source,'html')
    
    # Isolate the vocabulary list information
    vocab_lists = bs.find_all('a',{'class':"vocablist-title"})
    
    print('Detected ' + str(len(vocab_lists)) + ' lists.\n')
    
    print('****************************************************')

    for lst in vocab_lists:
    
        # Extract link
        lst_href = baseURL + lst['href']

        # Append list link
        lst_urls.append(lst_href)

        # Extract name 
        lst_name = lst.text.strip()

        # Append list name
        lst_names.append(lst_name)
        
    # Get the details of each list    
    for i,lst_url in enumerate(lst_urls):
        
        print('GETTING LIST NUMBER: ',i)
        
        for j in range(0,total_number_of_tries):
            
            try:
            
                get_vocab_list_details(driver,lst_url,lang_directory)
                
                # if all ok, then break out of the for loop
                break
                
            except TimeoutException as e:
                
                error_count = j + 1
                
                if error_count < total_number_of_tries:
                    
                    # We will try again, but first lets give their server a short break
                    
                    # Load something else
                    driver.get('https://www.google.com/')
                    
                    # Wait 
                    time.sleep(delay)
                    
                else:
                    error_log.write('TimeoutException Thrown and caught multiple times\n')
                    error_log.write('\n')
                    error_log.write('Error on list: ' + lst_names[i])
                    error_log.write('\n')
                    error_log.write(lst_url)
                    error_log.write('\n')
                    error_log.write('error_count: ' + str(error_count))
                    error_log.write('\n')
                    error_log.write('Maximum number of retries reached. You should try to download this list again!')
                    error_log.write('\n')
                    error_log.write('******************************************************')
                    error_log.write('\n')

    error_log.close()
    
    print('****************************************************')
    

In [13]:
def get_all_vocab_lists(lang,directory):
    """
    Downloads all vocabulary lists for a selected language and puts 
    them into your directory. You need to be logged into Skritter 
    using skr_login().
    
    lang: string = 'chinese', 'japanese', or 'both'
    directory: string = path to where you want the lists put.
    """
    
    # Which language?
    if lang.lower() == 'chinese':
        count = 1
        language = 'Chinese'
    elif lang.lower() == 'japanese':
        count = 1
        language = 'Japanese'
    elif lang.lower() == 'both':
        count = 2
    else:
        count = 0
        print('Only chinese, japanese or both are acceptable input for lang')
        
    # Only want one language
    if count == 1:
        
        # Load the browse lists page
        driver.get(browse_lists_url)

        # Wait for vocab to load
        rowsOfVocab = WebDriverWait(driver,delay).until(
            EC.presence_of_element_located((By.CLASS_NAME,'vocablist-wrapper')))
        
        # Check which language
        langBtn = WebDriverWait(driver,delay).until(
            EC.element_to_be_clickable((By.CLASS_NAME,'lang-select-icon-wrapper')))
        langBtn.click()
        
        # Wait for currently studying info to load
        curStudy = WebDriverWait(driver,delay).until(
            EC.presence_of_element_located((By.CLASS_NAME,'currently-studying')))
        
        # Create a beautiful soup
        bs = BeautifulSoup(driver.page_source,'html')
        
        # Find out what is currently being studying
        currently = bs.find('li',{'class':'currently-studying'}).text.strip().lower()
        
        # Change if necessary
        if language.lower() != currently:
            
            print('Changing language to ' + language)
            
            # Change language
            driver.find_element_by_class_name("other-lang-icon").click()
            
            # Changing from one language to another on exactly the same 
            # kind of page, so we force the driver to wait
            time.sleep(15)
            
            # Wait for stuff to load
            rowsOfVocab = WebDriverWait(driver,delay).until(
                EC.presence_of_element_located((By.CLASS_NAME,'vocablist-wrapper')))
                
        print('Getting ' + language + ' lists.')
        
        # Language directory name
        lang_directory = os.path.join(directory,language)

        # Create language directory if doesn't already exist
        if not os.path.exists(lang_directory):
            os.mkdir(lang_directory)
            print("Directory: " , lang_directory ,  " created.")
        else:    
            print("Directory: " , lang_directory ,  " already exists. Lists will be placed here.")

        # Get all the data for this language
        get_all_vocab_lists_data_for_lang(driver,lang_directory)

    if count == 2:
        
        # Start with Chinese then switch
        language = 'Chinese' 
        
        # Load the browse lists page
        driver.get(browse_lists_url)

        # Wait for vocab to load
        rowsOfVocab = WebDriverWait(driver,delay).until(
            EC.presence_of_element_located((By.CLASS_NAME,'vocablist-wrapper')))
        
        # Check which language
        langBtn = WebDriverWait(driver,delay).until(
            EC.element_to_be_clickable((By.CLASS_NAME,'lang-select-icon-wrapper')))
        langBtn.click()
        
        # Wait for currently studying info to load
        curStudy = WebDriverWait(driver,delay).until(
            EC.presence_of_element_located((By.CLASS_NAME,'currently-studying')))
        
        # Create a beautiful soup
        bs = BeautifulSoup(driver.page_source,'html')
               
        currently = bs.find('li',{'class':'currently-studying'}).text.strip().lower()
        
        # Change if necessary
        if language.lower() != currently:
            
            print('Changing language to ' + language)
            
            # Change language
            driver.find_element_by_class_name("other-lang-icon").click()
            
            # Changing from one language to another on exactly the same 
            # kind of page, so we force the driver to wait
            time.sleep(15)
            
            # Wait for stuff to load
            rowsOfVocab = WebDriverWait(driver,delay).until(
                EC.presence_of_element_located((By.CLASS_NAME,'vocablist-wrapper')))
              
        print('Getting ' + language + ' lists.')
    
        # Language directory name
        lang_directory = os.path.join(directory,language)

        # Create language directory if doesn't already exist
        if not os.path.exists(lang_directory):
            os.mkdir(lang_directory)
            print("Directory: " , lang_directory ,  " created.")
        else:    
            print("Directory: " , lang_directory ,  " already exists. Lists will be placed here.")

        # Get all the data for Chinese
        get_all_vocab_lists_data_for_lang(driver,lang_directory)
        
        # Switch language to Japanese
        language = 'Japanese'
        
        # Load browse lists page
        driver.get(browse_lists_url)
        
        # Wait for vocab to load
        rowsOfVocab = WebDriverWait(driver,delay).until(
            EC.presence_of_element_located((By.CLASS_NAME,'vocablist-wrapper')))
        
        # Change language
        langBtn = WebDriverWait(driver,delay).until(
            EC.element_to_be_clickable((By.CLASS_NAME,'lang-select-icon-wrapper')))
        langBtn.click()
        
        # Change languge 
        otherLangBtn = WebDriverWait(driver,delay).until(
            EC.element_to_be_clickable((By.CLASS_NAME,'other-lang-icon')))
        otherLangBtn.click()
        
        # Changing from one language to another on exactly the same 
        # kind of page, so we force the driver to wait
        time.sleep(15)
                
        print('Getting ' + language + ' lists.')
    
        # Language directory name
        lang_directory = os.path.join(directory,language)

        # Create language directory if doesn't already exist
        if not os.path.exists(lang_directory):
            os.mkdir(lang_directory)
            print("Directory: " , lang_directory ,  " created.")
        else:    
            print("Directory: " , lang_directory ,  " already exists. Lists will be placed here.")

        # Get all the data for Japanese
        get_all_vocab_lists_data_for_lang(driver,lang_directory)
             
    return 'Done' 
    

In [14]:
def get_vocab_list_links(lang):
    
    """
    Downloads all links for vocabulary lists for a selected 
    language. You need to be logged into Skritter using skr_login().
    
    lang: string = 'chinese' or 'japanese' only.
    """
    
    # Which language?
    if lang.lower() == 'chinese':
        language = 'Chinese'
    elif lang.lower() == 'japanese':
        language = 'Japanese'
    else:
        print('Only chinese or japanese are acceptable input for lang')
              
    # Load the browse lists page
    driver.get(browse_lists_url)

    # Wait for vocab to load
    rowsOfVocab = WebDriverWait(driver,delay).until(
        EC.presence_of_element_located((By.CLASS_NAME,'vocablist-wrapper')))
        
    # Check which language
    langBtn = WebDriverWait(driver,delay).until(
        EC.element_to_be_clickable((By.CLASS_NAME,'lang-select-icon-wrapper')))
    langBtn.click()
        
    # Wait for currently studying info to load
    curStudy = WebDriverWait(driver,delay).until(
        EC.presence_of_element_located((By.CLASS_NAME,'currently-studying')))
        
    # Create a beautiful soup
    bs = BeautifulSoup(driver.page_source,'html')
        
    # Find out what is currently being studying
    currently = bs.find('li',{'class':'currently-studying'}).text.strip().lower()
        
    # Change if necessary
    if language.lower() != currently:
            
        print('Changing language to ' + language)
            
        # Change language
        driver.find_element_by_class_name("other-lang-icon").click()
            
        # Changing from one language to another on exactly the same 
        # kind of page, so we force the driver to wait
        time.sleep(15)
            
        # Wait for stuff to load
        rowsOfVocab = WebDriverWait(driver,delay).until(
            EC.presence_of_element_located((By.CLASS_NAME,'vocablist-wrapper')))
                
    print('Getting ' + language + ' list links.')
        
    

    # Isolate the vocabulary list information
    vocab_lists = bs.find_all('a',{'class':"vocablist-title"})
    
    print('Detected ' + str(len(vocab_lists)) + ' lists.\n')
    
    print('****************************************************')

    lst_urls = []
    lst_names = []
    
    for lst in vocab_lists:
    
        # Extract link
        lst_href = baseURL + lst['href']

        # Append list link
        lst_urls.append(lst_href)

        # Extract name 
        lst_name = lst.text.strip()

        # Append list name
        lst_names.append(lst_name)
        
    # Get the details of each list    
    for i,lst_url in enumerate(lst_urls):
        
        print(i,lst_names[i],lst_urls[i])
    
    
    
    return lst_names,lst_urls
    

## Chinese textbook lists (Not recommended)
Note this is not all the Chinese lists available on the site! Just the best ones. It takes days and needs better error handling.

In [15]:
#get_all_vocab_lists('chinese',local_Skr_dir)

## Japanese textbook lists (Not recommended)
Note this is not all the Japanese lists available on the site! Just the best ones.

In [16]:
#get_all_vocab_lists('japanese',local_Skr_dir)

## Both Chinese and Japanese textbook lists  (Not recommended)
Note this is not all the lists available on the site! Just the best ones.

In [17]:
#get_all_vocab_lists('both',local_Skr_dir)

## Not greedy? Do you only want a few select lists?  (Recommended)

If you just want a few lists, then get the urls for those lists by

In [18]:
names,links = get_vocab_list_links('Chinese')

Changing language to Chinese
Getting Chinese list links.
Detected 400 lists.

****************************************************
0 Skritter Chinese 101 https://skritter.com//vocablists/view/163467231
1 HSK 1 https://skritter.com//vocablists/view/47872248
2 250 Essential Chinese Characters Vol. 1 Revised Edition https://skritter.com//vocablists/view/4810723636019200
3 Integrated Chinese 1 (4th Edition, 2017) https://skritter.com//vocablists/view/6136075210850304
4 HSK 2 https://skritter.com//vocablists/view/47828570
5 100 Common Radicals https://skritter.com//vocablists/view/5187734513778688
6 HSK 3 https://skritter.com//vocablists/view/47892865
7 Integrated Chinese Level 1, Parts 1 & 2 (3rd edition, 2008) https://skritter.com//vocablists/view/39020869
8 HSK 4 https://skritter.com//vocablists/view/47846448
9 Integrated Chinese Level 2, Parts 1 & 2 (3rd Edition, 2009) https://skritter.com//vocablists/view/39066148
10 HSK 5 https://skritter.com//vocablists/view/47879238
11 HSK 6 https:/

In [19]:
selection = np.arange(20,30)
for i in selection:
    print(i,names[i],links[i])

20 Chinese Made Easy Level 1 https://skritter.com//vocablists/view/51258977
21 Chinese Link 1 https://skritter.com//vocablists/view/40625531
22 Boya Chinese: Elementary Starter II https://skritter.com//vocablists/view/12570320
23 Integrated Chinese Level 1, Part 2 (3rd Edition, 2008) https://skritter.com//vocablists/view/363513317
24 Chinese Breeze: I Really Want to Find Her... 我一定要找到她…… https://skritter.com//vocablists/view/39337098
25 Integrated Chinese Level 1, Part 1 (3rd Edition, 2008) https://skritter.com//vocablists/view/6180158510202880
26 Chinese Stroke Names https://skritter.com//vocablists/view/63181049
27 New Practical Chinese Reader (Vol 2) https://skritter.com//vocablists/view/4507254057074688
28 Anything Goes, First Edition https://skritter.com//vocablists/view/11131583
29 Boya Chinese: Pre-intermediate Speed Up I https://skritter.com//vocablists/view/12495821


In [20]:
selected_urls = [links[i] for i in selection]

Links obtained from Skritter 

In [25]:
# If there are desired list not on the main page you can obtain them by simply copying and pasting their links below
#selected_urls = ['firstLink',
#                'secondLink',
#                'thirdLink',
#                'fourthLink',
#                'fifthLink',
#                 'sixthLink']


This function will get your selected lists

In [26]:
def get_selected_lists(driver,selected_urls,directory):
    """
    Downloads selected vocabulary lists and puts them into 
    your directory. You need to be logged into Skritter 
    using skr_login().
    
    driver:  Selenium web driver logged into Skritter 
    selected_urls: list = urls of selected lists
    directory: string = path to where you want the lists put.
    """
    
    # Create language directory if doesn't already exist
    if not os.path.exists(directory):
        os.mkdir(directory)
        print("Directory: " , directory ,  " created.")
    else:    
        print("Directory: " , directory ,  " already exists. Lists will be placed here.")
    
    N = len(selected_urls)
    
    for i in np.arange(N):
        
        print('Getting list ', i+1, ' of ', N)
        
        get_vocab_list_details(driver,selected_urls[i],directory)

In [27]:
my_selected_dir = os.path.join(local_Skr_dir,'Selected_lessons')

In [None]:
get_selected_lists(driver,selected_urls,my_selected_dir)