# <span id="up" style="color:red">RED</span>ACTOR

**<span id="up" style="color:red">RED</span>ACTOR is a cozy Juputer Notebook created by [Bilinguator.com](https://bilinguator.com/) to make texts cleaner. It may help to place quotation marks, parentheses, dashes, ellipses in the desired format, remove unnecessary line breaks or page numbers, find and tag titles and character names. For more information, visit [Github page](https://github.com/bilinguator/redactor/).**

|Contents|
|---|
|[Load modules and scripts](#load-modules-and-scripts)|
|[Load text from file](#load-text-from-file)|
|[Technical artifacts](#technical-artifacts)|
|[Alphabet](#alphabet)|
|[Repetitions](#repetitions)|
|[Search for substrings](#search-for-substrings)|
|[Ellipsis](#ellipsis)|
|[Apostrophes](#apostrophes)|
|[Quotation marks](#quotation-marks)|
|[Dashes](#dashes)|
|[Characters in the play](#characters-in-the-play)|
|[Dot the ⟨Ё⟩ letters in Russian](#dot-the-yo)|
|[Headings](#headings)|
|[Split text](#split-text)|
|[Save text](#save-text)|

# <span id='load-modules-and-scripts'>Load modules and scripts</span>

[Up](#up) | [Save text](#save-text)

In [None]:
import os
import sys
import math
import html
import re
from tqdm import tqdm
import matplotlib.pyplot as plt 
from alphabets import *
from paragraphs import *
from quotes_and_brackets import *
from redactor import *
from file_split import *

# <span id='load-text-from-file'>Load text from file</span>

[Up](#up) | [Save text](#save-text)

**`lang` - ISO code of the language. Codes are available in `get_alphabet` function of the `alphabets.py` script.**

**`file_address` - TXT file to be preprocessed.**

**All repeated `\n` are replace by one `\n` here.**

In [None]:
lang = 'en'
file_address = 'path/to/text'
file = open(file_address, 'r', encoding='utf-8')
text = file.read().strip()
file.close()

assert text != '', f'File {file_address} is empty!'
print(f'File \033[1m{file_address}\033[0m contains:')
print(f'• {len(text)} symbols;')
print('•', text.count('\n')+1, 'paragraphs.')

**Convert special HTML characters to the readable Unicode characters.**

In [None]:
text = html.unescape(text)

**Trim leading and trailing characters specified in `characters` in every paragraph.**

In [None]:
characters = '　'
text_len_before = len(text)
text = trim_paragraphs(text, characters)
symbols_trimmed_count = text_len_before - len(text)
print(f'{symbols_trimmed_count} symbols trimmed out.')

# <span id='technical-artifacts'>Technical artifacts</span>

[Up](#up) | [Save text](#save-text)

## Technical line breaks

**Remove technical line breaks. By technical line breaks, we mean the breaks formed in the process of translating text from one file extension to another. So, when saving a TXT file from PDF, there are many line breaks in the middle of sentences. The `remove_tech_line_breaks` function removes such breaks and returns a repaired text. `glue` is the character wherewith divided sentences are merged. For Japanese `ja` and Chinese `zh` languages define `glue` as empty string.**

In [None]:
glue = ' '
print(len(text.split('\n')), 'paragraphs before removing technical line breaks.')
text = remove_tech_line_breaks(text, start_line=2)
print(len(text.split('\n')), 'paragraphs after removing technical line breaks.')

## Page numbers

**Detect paragraphs with page numbers.**

In [None]:
page_number_paragraphs = detect_chapters(text, chapter='', numbering='arabic')
print('Paragraphs with page numbers:', page_number_paragraphs)

**Remove paragraphs with page numbers.**

In [None]:
text = remove_paragraphs(text, page_number_paragraphs)

## Repeating paragraphs

**Detect page headings and other repeating paragraphs.**

In [None]:
repeated_paragraphs = count_repeated_paragraphs(text)

if bool(repeated_paragraphs):
    print('Repeating paragraphs\n(times)')
    for key, value in repeated_paragraphs.items():
        print(f'\n{key}\n({value})')
else:
    print('No repeating paragraphs found.')

**Specify repeating paragraph to be removed in the `paragraph_to_remove` variable.**

In [None]:
paragraph_to_remove = 'Спасибо, что скачали книгу в бесплатной электронной библиотеке Royallib.ru'

text = re.split(rf'{paragraph_to_remove}$', text, flags=re.MULTILINE)
text = [c.strip() for c in text]
text = '\n'.join(text)

## Footnote links

**Find footnote links like `{1}`, `[1]`, `(1)`, etc.**

In [None]:
footnote_regex = '{\d+}|\[\d+\]|\(\d+\)'
footnote_links = set(re.findall(footnote_regex, text))
if bool(footnote_links):
    print(*footnote_links)
else:
    print('No footnote links found.')

**Remove footnote links.**

In [None]:
text = ''.join(re.split(footnote_regex, text))

# Notes

[Up](#up) | [Save text](#save-text)

**If notes look like `[1 - Sample text]`, change it to `[* Sample text]`.**

In [None]:
text = '[*'.join(re.split('\[\d+ -', text))

**Print paragraphs by query.** 

In [None]:
print_paragraphs_by_query(text, '[*')

# <span id='alphabet'>Alphabet</span>

[Up](#up) | [Save text](#save-text)

**Check text’s alphabet.**

In [None]:
text_characters = set(text.upper()) - set([' ', '\n'])
print('Unique characters of the text:')
print(set_to_str(text_characters))
print()
print(f'Characters of the text subtracted by the \033[1m{lang}\033[0m alphabet:')
print(set_to_str(text_characters - get_alphabet(lang)))
print()
print(f'\033[1m{lang}\033[0m alphabet:')
print(set_to_str(get_alphabet(lang)))
print()
print(f'Unused characters of the \033[1m{lang}\033[0m alphabet:')
print(set_to_str(get_alphabet(lang) - text_characters))

**Delete unnecessary characters.**

In [None]:
unnecessary_characters = '€'

for c in unnecessary_characters:
    text = text.replace(c, '')

**Print paragraphs by query.** 

In [None]:
print_paragraphs_by_query(text, '–')

# <span id='repetitions'>Repetitions</span>

[Up](#up) | [Save text](#save-text)

**Find characters repetitions in the text.**

In [None]:
print('Characters’ repetitions in the text:')
text_characters_all = set(text.upper())
text_upper = text.upper()
for c in text_characters_all:
    count = text_upper.count(c*2)
    if count > 0:
        print(f'{c*2} - {count}')

**Specify repetition in the `query` variable and explore it in the text. If `query`’s case matters, specify it in the `case_matters` attribute of the `print_paragraphs_by_query` function.**

In [None]:
query = '..'
print_paragraphs_by_query(text, query, case_matters=False)

# <span id='search-for-substrings'>Search for substrings</span>

[Up](#up) | [Save text](#save-text)

**Type a wanted character to `query` variable and count how many hits are found in particular paragraphs. If `query`’s case matters, specify it in the `case_matters` attribute of the `print_paragraphs_by_query` function.**

In [None]:
query = '-'
search_in_paragraphs(text, query, case_matters=False)

**Print paragraphs with the `query`. If `query`’s case matters, specify it in the `case_matters` attribute of the `print_paragraphs_by_query` function.**

In [None]:
query = '('
print_paragraphs_by_query(text, query, case_matters=False)

# <span id='ellipsis'>Ellipsis</span>

[Up](#up) | [Save text](#save-text)

**Check how many different ellipses are there in the text.**

In [None]:
ellipses = ['. . .', '...', '... .', '....' , '⋯', '…']

print('\033[1mEllipsis counts\033[0m')
for ellipsis in ellipses:
    print(text.count(ellipsis) ,ellipsis, sep='\t')

**Print paragraphs with an ellipsis specified in `query`.**

In [None]:
query = '...'
print_paragraphs_by_query(text, query, case_matters=False)

**Replace `old_ellipsis` to `new_ellipsis`.**

In [None]:
old_ellipsis = '...'
new_ellipsis = '…'
print(text.count(old_ellipsis), '- before replacement.')
text = text.replace(old_ellipsis, new_ellipsis)
print(text.count(old_ellipsis), '- after replacement.')

# <span id='apostrophes'>Apostrophes</span>

[Up](#up) | [Save text](#save-text)

**Count all apostrophe characters in the text before recovering.**

In [None]:
print(text.count('\''), '- typewriter apostrophes ⟨\'⟩.')
print(text.count('’'), '- punctuation apostrophes ⟨’⟩.')

**Replace typewriter apostrophe `'` to punctuation apostrophe `’`. To replace all apostrophes located inside words, specify `replace_inner=True`.**

In [None]:
text = recover_apostrophes_by_lang(text, lang, replace_inner=True)

**Count all apostrophe characters in the text after recovering.**

In [None]:
print(text.count('\''), '- typewriter apostrophes ⟨\'⟩.')
print(text.count('’'), '- punctuation apostrophes ⟨’⟩.')

**Explore the paragraphs with typewriter apostrophes ⟨`'`⟩.**

In [None]:
print_paragraphs_by_query(text, '\'')

**Explore the paragraphs with punctuation apostrophes ⟨`’`⟩.**

In [None]:
print_paragraphs_by_query(text, '’')

**Replace typewriter apostrophes ⟨`'`⟩ with punctuation apostrophes ⟨`’`⟩ in the interactive mode. Leave input fields empty if no replacement is needed, otherwise type any characters.**

In [None]:
text = replace_interactively(text, '\'', '’', scope=55)

In [None]:
print_paragraphs_by_query(text, '\'')

## Direct replacement

**If applicable, replace all apostrophes `old_apostrophe` with `new_apostrophe` directly.**

In [None]:
old_apostrophe = '\''
new_apostrophe = '’'
text = text.replace(old_apostrophe, new_apostrophe)

# <span id='quotation-marks'>Quotation marks</span>

[Up](#up) | [Save text](#save-text)

## What quotation marks are there in the text?

In [None]:
text_characters = set(text.upper()) - set([' ', '\n'])

print('Quotation marks found in the text:')
text_quotes = set_to_str(text_characters & get_quotes())
print(text_quotes)
print()

print(f'Quotation marks of the \033[1m{lang}\033[0m language:')
print(quotes_to_str_by_lang(lang))
print()

print(f'Quotation marks of the \033[1m{lang}\033[0m language used in the text:')
print(set_to_str(set(text_quotes) & set(quotes_to_str_by_lang(lang))))

In [None]:
print_paragraphs_by_query(text, '"')

## Automatic replacement

**Choose quotes `old_quotes` to be replaced with `new_quotes` automatically.**

In [None]:
for level in ('primary', 'secondary'):    
    print(get_quotes(lang, level=level), f'— {level} qoutes.')

In [None]:
old_quotes = ('"', '"')
new_quotes = ('«', '»')

text = replace_by_regex(text, f'\s{old_quotes[0]}\S', old_quotes[0], new_quotes[0])
text = replace_by_regex(text, f'\S{old_quotes[1]}\s', old_quotes[1], new_quotes[1])

print('Quotes counts after automatic replacement.\n')
print('\033[1mQuote\tCount\033[0m')
for quotes in [old_quotes, new_quotes]:
    for quote in quotes:
        print(f'{quote}\t{text.count(quote)}')

## Interactive replacement

**Choose target quotation marks of the language of interest. Then, replace quotation marks in the interactive mode:**

**`old_quote` - attribute for quotation mark (opening or closing) to be replaced;**

**`new_quote` - attribute for target quotation mark.**

**Type anything to accept change, otherwise leave the input field empty.**

In [None]:
for level in ('primary', 'secondary'):
    print(get_quotes(lang, level=level), f'— {level} qoutes.')

In [None]:
old_quote = '"'
new_quote = '”'
text = replace_interactively(text, old_quote, new_quote)

print('Quotes counts after interactive replacement.\n')
print('\033[1mQuote\tCount\033[0m')
for quotes in [old_quotes, new_quotes]:
    for quote in quotes:
        print(f'{quote}\t{text.count(quote)}')

## Direct replacement

**If applicable, replace all quotes `old_quote` with `new_quote` directly.**

In [None]:
for level in ('primary', 'secondary'):
    print(get_quotes(lang, level=level), f'— {level} qoutes.')

In [None]:
old_quote = '”'
new_quote = '“'
text = text.replace(old_quote, new_quote)

# <span id='dashes'>Dashes</span>

[Up](#up) | [Save text](#save-text)

**`-` — hyphen, part of compound words;**

**`–` — en dash, used for number intervals;**

**`—` — em dash, a multifunctional punctuation mark for dialogues, thought abruptions, instead of colons and parenthetical marks, etc.**

**Check how many dashes are in the text.**

In [None]:
hyphen = '-'
en_dash = '–'
em_dash = '—'

print('\033[1mDash type\tDash\tCount\033[0m')
print('hyphen', hyphen, text.count(hyphen), sep='\t')
print('en_dash', en_dash, text.count(en_dash), sep='\t')
print('em_dash', em_dash, text.count(em_dash), sep='\t')

**Print paragraphs with a `query` dash.**

In [None]:
query = hyphen
print_paragraphs_by_query(text, query, case_matters=False)

## Dialogue dashes

**Detect if there are dialogue paragraphs and what dash they start with.**

In [None]:
print(f'Paragraphs starting with hyphen ⟨{hyphen}⟩:', end='\t')
print(len(search_paragraphs_by_regex(text, f'^{hyphen}.*')))

print(f'Paragraphs starting with en dash ⟨{en_dash}⟩:', end='\t')
print(len(search_paragraphs_by_regex(text, f'^{en_dash}.*')))

print(f'Paragraphs starting with em dash ⟨{em_dash}⟩:', end='\t')
print(len(search_paragraphs_by_regex(text, f'^{em_dash}.*')))

**Print paragraphs starting with the dash of interest. Specify the dash of interest in the `dash_of_interest` variable.**

In [None]:
dash_of_interest = hyphen
print_paragraphs_by_regex(text, f'^{dash_of_interest}.*')

**Replace dash for the dialogue paragraphs to the appropriate one.**

* **`old_dash` — dash symbol to be replaced;**
* **`new_dash` — new dash symbol of interest.**

In [None]:
old_dash = hyphen
new_dash = em_dash

text = replace_by_regex(text, f'(^|\n)+{old_dash}.?', old_dash, new_dash)

**Print paragraphs with the new dashes.**

In [None]:
dash_of_interest = em_dash
print_paragraphs_by_regex(text, f'^{dash_of_interest}.*')

## Number intervals

**Find all number intervals with dahses.**

In [None]:
print(f'Number intervals with hyphen ⟨{hyphen}⟩:')
hyphen_intervals = get_number_intervals_by_dash(text, hyphen)
print(hyphen_intervals)

print(f'Number intervals with en dash ⟨{en_dash}⟩:')
en_dash_intervals = get_number_intervals_by_dash(text, en_dash)
print(en_dash_intervals)

print(f'Number intervals with em dash ⟨{em_dash}⟩:')
em_dash_intervals = get_number_intervals_by_dash(text, em_dash)
print(em_dash_intervals)

**Replace `old_dash` in number intervals to `new_dash`.**

In [None]:
old_dash = hyphen
new_dash = en_dash
text = replace_by_regex(text, f'\d+ ?{old_dash} ?\d+', old_dash, new_dash)

**Remove space characters in the number intervals. Example: `123 – 456` → `123–456`.**

In [None]:
text = replace_by_regex(text, f'\d+ {en_dash} \d+', ' ', '')

**Check all number intervals with dahses.**

In [None]:
print(f'Number intervals with hyphen ⟨{hyphen}⟩:')
hyphen_intervals = get_number_intervals_by_dash(text, hyphen)
print(hyphen_intervals)

print(f'Number intervals with en dash ⟨{en_dash}⟩:')
en_dash_intervals = get_number_intervals_by_dash(text, en_dash)
print(en_dash_intervals)

print(f'Number intervals with em dash ⟨{em_dash}⟩:')
em_dash_intervals = get_number_intervals_by_dash(text, em_dash)
print(em_dash_intervals)

## Dashes surrounded by spaces

**Find `query` dashes surrounded by spaces.**

In [None]:
query = hyphen
set(re.findall(f'\S+ {query} [\S]+', text))

**Replace `old_dash` surrounded by spaces to `new_dash`. Repeat the previous step while new hyphens are found.**

In [None]:
old_dash = hyphen
new_dash = em_dash
text = replace_by_regex(text, f'\S+ {old_dash} \S+', old_dash, new_dash)

## Dashes with spaces on one or both sides

**Search for `query` with spaces on one or both sides.**

In [None]:
query = hyphen
print(set(re.findall(f'\S+{query} | {query}\S+| {query} ', text)))

**Replace `old_dash` spaces on one or both sides to `new_dash`.**

In [None]:
old_dash = hyphen
new_dash = em_dash
text = replace_by_regex(text, f'\S+{old_dash} | {old_dash}\S+| {old_dash} ', old_dash, new_dash)

**Print paragraphs with a `query` dash.**

In [None]:
query = en_dash
print_paragraphs_by_query(text, query, case_matters=False)

## Dashes surrounded by letters

**Find `query` dashes surrounded by letters.**

In [None]:
query = em_dash
print(set(re.findall(f'\w+{query} | {query}\w+|\w+{query}\w+', text)))

**Replace `old_dash` surrounded by letters to `new_dash`.**

In [None]:
old_dash = em_dash
new_dash = hyphen
text = replace_by_regex(text, f'\w+{old_dash} | {old_dash}\w+|\w+{old_dash}\w+', old_dash, new_dash)

## Replace dashes in the interactive mode

**Replace `old_dash` with `new_dash` in interactive mode. Leave input fields empty if no replacement is needed, otherwise type any characters.**

In [None]:
old_dash = hyphen
new_dash = em_dash

text = replace_interactively(text, old_dash, em_dash)

print('\033[1mDash type\tDash\tCount\033[0m')
print('Old dash', old_dash, text.count(old_dash), sep='\t')
print('New dash', new_dash, text.count(new_dash), sep='\t')

## Direct replacement

**If applicable, replace all quotes `old_dash` with `new_dash` directly.**

In [None]:
old_dash = en_dash
new_dash = em_dash
text = text.replace(old_dash, new_dash)

print('\033[1mDash type\tDash\tCount\033[0m')
print('Old dash', old_dash, text.count(old_dash), sep='\t')
print('New dash', new_dash, text.count(new_dash), sep='\t')

# <span id='characters-in-the-play'>Characters in the play</span>

[Up](#up) | [Save text](#save-text)

**Paragraphs with speaking characters start with their names. The names should be enclosed in `<b></b>` tags.**

**List all the characters of the play in the `characters` list variable.**

In [None]:
characters = [
    'Greg',
    'Samp',
    'Abr',
    'Tyb',
    'Ben',
    'Mont',
    'Prince',
    'Romeo',
    'Juliet',
    'Friar L'
]

**Print all the dialogue paragraphs with their numbers. `dialogue_delimiter` specifies what punctuation mark separates characters’ names from their speeches. Use backslash `\` before symbols reserved for regular expressions.**

In [None]:
dialogue_delimiter = '\.'
characters_regex = '|'.join(characters)
characters_regex = f'^({characters_regex}){dialogue_delimiter}.*'
print(f'Regular expression to search for dialogues:\n{characters_regex}\n')
print_paragraphs_by_regex(text, characters_regex)

**Get dialogue paragraphs numbers. Change the `dialogue_paragraphs` variable if some of paragraphs do not suit.**

In [None]:
print('Numbers of dialogue paragraphs:')
dialogue_paragraphs = search_paragraphs_by_regex(text, characters_regex)
print(dialogue_paragraphs)

**Enclose characters’ names in `<b><\b>` tags.**

In [None]:
text = tag_characters(text, dialogue_paragraphs, characters, '.', tag='b')

**Merge speeches of characters with `<delimiter>` if this is poetry.**

**NB! The `merge_speeches` function merges all the paragraphs not containing `<b>` and `<h1>` tags! These paragraphs may not be parts of speeches.**

In [None]:
text = merge_speeches(text)
print('Current statistics:')
print(f'• {len(text)} symbols;')
print('•', len(text.split('\n')), 'paragraphs.')

# <span id='poetry'>Poetry</span>

[Up](#up) | [Save text](#save-text)

**Join strophes with `<delimiter>`, so they constitute separate paragraphs. Strophe in this case is the part of a text, separated by more than one line breaks (`\n`).**
* **`start_line` — 0-based index of the first line where function starts to act;**
* **`min_line_breaks` — minimal line breaks count between strophes.**

In [None]:
print('Before:', text.count('\n')+1, 'paragraphs.')
text = join_strophes(text, delimiter='<delimiter>', start_line=2, min_line_breaks=2)
print('After:', text.count('\n')+1, 'paragraphs.')

**Detach paragraphs with specified `tags` which are glued with delimiter.**

In [None]:
tags = ['h1']
delimiter = '<delimiter>'

print('Before:', text.count('\n')+1, 'paragraphs.')
text = detach_paragraphs(text, tags, delimiter)
print('After:', text.count('\n')+1, 'paragraphs.')

# <span id='dot-the-yo'>Dot the ⟨Ё⟩ letters in Russian</span>

[Up](#up) | [Save text](#save-text)

**Use Ёditor to dot the ⟨Ё⟩ letters in a Russian text. For more information about the instrument visit its [Github page](https://github.com/bilinguator/yoditor).**

## Load database

**For a start, specify the location of the `yoditor` directory with all Ёditor contents in the `yoditor_dir_path` variable. If it is located in the same directory as `redactor`, leave `yoditor_dir_path` equal `'..'`**

In [None]:
yoditor_dir_path = '..'
project_dir = os.path.abspath(yoditor_dir_path)
sys.path.insert(0, project_dir)
import yoditor.yoditor as yoditor

## Dot the sure ⟨Ё⟩ letters

**Dot the ⟨Ё⟩ letters in the words which always spelled with them.**

In [None]:
text = yoditor.recover_yo_sure(text)

## Dot the unsure ⟨Ё⟩ letters

**Dot the ⟨Ё⟩ letters in the words which are ambiguous about its spelling. Do it in the interactive mode. For every word, the input field is shown. If the replacement is needed type the `ё` letter in the field, otherwise leave it empty. Press Enter to confirm your choise.**

In [None]:
text = yoditor.recover_yo_unsure(text)

**Find paragraphs with words of ambiguous spelling. If `query`’s case matters, specify it in the `case_matters` attribute of the `print_paragraphs_by_query` function.**

In [None]:
query = 'вперёд'
print_paragraphs_by_query(text, query, case_matters=False)

# <span id='headings'>Headings</span>

[Up](#up) | [Save text](#save-text)

**Detect paragraphs containing chapters’ headings. Specify arguments in the `detect_chapters` function:**

**`chapter` - string containing key word for a chapter, e.g. “Chapter” for English, “Глава” for Russian, “Chapitre” for French, etc.; word case matters here; the space character delimiting the key word from a numeral is the part of this argument;**

**`numbering` - string of one of the following variants:**
* **`"arabic"` for Arabic numbers (1, 2, 3, etc.);**
* **`"roman"` for the Roman numbers (I, II, III, IV, etc.);**
* **`"ja"` or `"zh"` for Japanese or Chinese numerals (一、二、三、四, etc.);**
* **`"ar"`, `"fa"` or `"ur"` for the Aribic Persian numbers (۱, ۲, ۳, etc.);**
* **`"text"` for non-numeric characters; suitable if numerals in words are presented (‘One’, ‘Two’, ‘Three’, etc.);**

**For other options for `numbering` use `help(detect_chapters)`**.

**`delimiter` - delimiter among the chapter-numbering and title parts;**

**`with_title` - boolean specifying if chapters have titles;**

**`numbering_first` - boolean specifying if numbering precedes chapter key word (‘XIX chapter’).**

**Example**

If the chapters’ headings of your text look like `Chapter MMMCMXCIX — Epilogue`, specify arguments as follows:

`chapter = "Chapter "`

`numbering = "roman"`

`delimiter = " — "`

`with_title = True`

`numbering_first = False`

In [None]:
print('Chapters detected:\n')
chapter_paragraphs = detect_chapters(text, chapter='', numbering='arabic', delimiter='. ',
                                     with_title=True, numbering_first=True)
print('\nTotal count:', len(chapter_paragraphs))
print('Paragraphs set:', chapter_paragraphs, sep='\n')

**Check paragraphs you want to add.**

In [None]:
paragraph_to_check = 2
text.split('\n')[paragraph_to_check]

**If there are paragraph numbers to be added or removed from the set, specify them in the `added_paragraphs` and `removed_paragraphs` variables respectively.**

In [None]:
added_paragraphs = set([])
removed_paragraphs = set([])
chapter_paragraphs = (chapter_paragraphs | added_paragraphs) - removed_paragraphs
print('Paragraphs set:', *sorted(list(chapter_paragraphs)), sep='\n')

**Enclose the detected chapters’ headings in `<h1></h1>` tags.**

In [None]:
text = tag_paragraphs(text, chapter_paragraphs, tag='h1')

## Change headings case

**Change case of headings to upper `case='upper'`, lower `case='lower'`, or first letter upper `case='capitalised'`.**

In [None]:
text = change_headings_case(text, chapter_paragraphs, case='capitalised', tags=['h1'])

# <span id='split-text'>Split text</span>

[Up](#up) | [Save text](#save-text)

**Plot how many files with how many paragraphs are created by dividing the text into parts of different chapters counts.**

* **`by` — delimiter by which to divide text into chapters (it is not deleted from the text), specify `<h1>` to divide text by headers into actual chapters;**
* **`chapters_in_first_file` — defines how many chapters to write in the first file, default is 3 to include the author, title (tagged with `<h1></h1>`) and first chapter parts;**
* **`chapters_per_file` — plotted range (both sides included) of chapters written to the second and all subsequent files.**

In [None]:
plot_paragraphs_counts(text, by='<h1>', chapters_in_first_file=13, chapters_per_file=(1, 20))

**Split the text into several parts with convenient counts of chapters and patagraphs.**

In [None]:
text_splitted = split_text(text, by='<h1>', chapters_in_first_file=3, chapters_per_file=1, verbose=True)

## Split the file into several parts

In [None]:
split_file(file_address, text_splitted)

## Merge splitted files in one

In [None]:
merge_file(file_address)

# <span id='save-text'>Save text</span>

[Up](#up)

**Specify another file address in the `file_address` variable if needed.**

In [None]:
text = '\n'.join(re.split('\n+', text)).strip()
file_address = file_address
file = open(file_address, 'w', encoding='utf-8')
file.write(text)
file.close()