# Dev Tasks

- ➕ Need to add the LateX command "\newline" after equations that start with single "$"
    - Even better: add same command before the equation, but twice
        - HOWEVER, this problem would not be triggered if the author writes the equations with double "$"s
- ➕ remove markdown comments from embedded files
- ➕ Embedded refs, when a certain section is referenced: Need to change the hierarchy of potential "inner sections"
- ➕ non-embedded external links --> remove Markdown linking format and add selection in the settings for the user to choose if they want to convert that reference to pdf as well and create hyperlink in the original pdf to that pdf

- ➕ section recognition from embedded notes does not work (test with "Assignment--11.md" and see "# Embedded-Section-Error" comment in "embedded_notes.py")
- ➕ Make it possible for internal links to have LateX write the number of page, in case the reader wants to print it
- ⚠➕ When we are in a hyperlink, the underscore makes LateX expect a subscript: [error link](https://tex.stackexchange.com/questions/292037/url-causes-missing-inserted-error). Example "\hyperlink{sNO Intuitive-Explanation}{ADD_NAME}" must be turned to "\hyperlink{sNO Intuitive-Explanation}{ADD\_NAME}"


## Math-related
- ⚠ When parentheses are part of the reference note name, the regex recognition fails


## Error cases
- Try these two lines: 
From [[Support Vector Machine (SVM)]]
[[kernel]]

for some reason, the code merges them into one line

## Edge cases
- ➕ Consider case wherein there's more than one sections with the same name


For the markdown comment removal, use for testing:
- [[p514--notes]]



# How to use
## Prerequisites
1. Have Python 3 installed



## Usage
For each user-defined parameter, go to the [User Parameters](#user-parameters) section, wherein the 'PARS' dictionary is located.

To set the paths for the .md file to be converted, change the `PARS['📂']['markdown-file']` and `PARS['📂']['tex-file']`.
Then, just run all code blocks and VOILA!

# Load Packages and helper functions

## Packages

In [2]:
import re
import sys
import glob, os
import numpy as np
from os.path import exists
from remove_markdown_comment import *
from symbol_replacements import *
from embedded_notes import *
from bullet_list__converter import *

## Helper functions

In [3]:
def conv_dict(D):
    for key, value in D.items():
        if value == '🟢':
            D[key] = True
        elif value == '🔴':
            D[key] = False
        elif isinstance(value, dict):
            D[key] = conv_dict(value)
    return D


# is_in_table_line = lambda x: x.startswith('|') and x.endswith('|')
# enum             = lambda x: enumerate(x)


# PARAMETERS

## Global Constants (to not be changed)

In [4]:
ID__TABLES__alignment__center = 0
ID__TABLES__alignment__right  = 1
ID__TABLES__alignment__middle = 2


ID__TABLES__PACKAGE__longtblr   = 0
ID__TABLES__PACKAGE__tabularx   = 1
ID__TABLES__PACKAGE__long_table = 2

ID__CNV__TABLE_STARTED      = 0
ID__CNV__TABLE_ENDED        = 1
ID__CNV__IDENTICAL          = 2

ID__STYLE__BOLD             = 0
ID__STYLE__HIGHLIGHTER      = 1

# ⚠ does not work for longtblr!
CMD__TABLE__TABULARX__CENTERING = '\\newcolumntype{Y}{>{\\centering\\arraybackslash}X}'


## User Parameters

In [5]:
path_files  = 'C:\\Users\\mariosg\\OneDrive - NTNU\\FILES\\workTips\\'
path0       = path_files + 'AUTOMATIONS\\'
path_file   = path_files + 'P-Tasks\\✔\\Assignment--11'


PARS = conv_dict({
    '⚙': # SETTINGS
        {'TABLES':{
                            'package': ID__TABLES__PACKAGE__long_table,
                'hlines-to-all-rows': '🔴',
                 'any-hlines-at-all': '🔴',
                         'alignment': [
                                        ID__TABLES__alignment__center,
                                        ID__TABLES__alignment__middle],
                        'rel-width': 1.2
                },
                      'margin': '0.9in',
                  'EXCEPTIONS': 
                                {
                                    'raise_exception__when__embedded_reference_not_found': '🔴'
                                    },
                     'EMBEDDED REFERENCES':  
                                        {'convert non embedded references': '🔴'}  # if True, then references such as "[[another note]]" will be changed to "another note". If FAlse, they will remain as is
                                          },                
    '📁':
           {
                'markdown-file': path_file+'.md',  # Markdown (.md) file for conversio=n
                     'tex-file': path_file+'.tex',  # LateX (.tex) file (converted from the .md file)
                        'vault': path_files
            },
    'par':
        {
            'tabular-package':
                            {
                                       'names': ['longtblr', 'tabularx'],
                                'before-lines': ['{colspec}']
                            },
            'packages-to-load':[                    # Which packages to load on the LateX preable
                                'hyperref',
                                'graphicx',
                                'amssymb',           # need more symbols
                                'titlesec',          # so that we can add more subsections (using 'paragraph')
                                'xcolor, soul',      # for the highlighter
                                'amsmath',
                                'amsfonts',
                                'cancel'
                                ],
          'symbols-to-replace': [       # Obsidian symbol, latex symbol,            type of replacement (1 or 2)
                                        ['✔',              '\\checkmark',            1],
                                        ['🟢',              '$\\\\blacklozenge$',    2],
                                        ['🔴',              '\\\maltese',            2],
                                        ['➕',              '\\boxplus',             2],
                                        ['🔗',              'LINK',                  1],
                                        ['\implies',        '\Rightarrow',            1],
                                        ['❓',              '?',                      1],
                                        ['❌',              'NO',                     1],
                                        ['🤔',               '',                      1],
                                        ['⚠',               '!!',                      1],
                                        ['\\text',          '\\textnormal',           1]
                                        ]
        }
        
})


# Rest of code

In [7]:
def package_loader():

    packages_to_load    = []
    packages_to_load +=PARS['par']['packages-to-load']
    
    tables_package      = PARS['⚙']['TABLES']['package']
    page_margin         = PARS['⚙']['margin']

    if tables_package == ID__TABLES__PACKAGE__longtblr:

        packages_to_load.append('tabularray')
        packages_to_load.append('longtable')

    elif tables_package == ID__TABLES__PACKAGE__tabularx:
        
        packages_to_load.append('tabularx')

    elif tables_package == ID__TABLES__PACKAGE__long_table:

        packages_to_load.append('longtable')

    else:
        raise Exception("Nothing coded for this case")

        

    out = ['\\usepackage{'+x+'}' for x in packages_to_load]
    
    
    if len(page_margin) > 0:
        out.append('\\usepackage[margin='+ page_margin + ']{geometry}')
 
    
    return out


def replace_hyperlinks(S):
    

    # Anything that isn't a square closing bracket
    name_regex = "[^]]+"
    # http:// or https:// followed by anything but a closing paren
    url_regex = "http[s]?://[^)]+"

    markup_regex = '\[({0})]\(\s*({1})\s*\)'.format(name_regex, url_regex)

    S_1 = []
    for s in S:
        s1 = s 

        for match in re.findall(markup_regex, s1):
            markdown_link = '[' + match[0] + '](' + match[1] + ')'
            latex_link = "\\href{" + match[1] + "}{" + match[0] + "}"
            s1 = s1.replace(markdown_link, latex_link)

        S_1.append(s1)
    
    return S_1

def identify__tables(S):

    table_indexes = []
    table_has_started = False
    for i, l in enum(S):
        lstr = l.lstrip().rstrip()
        is_table_line = is_in_table_line(lstr)        
        if is_table_line and (not table_has_started):
            table_has_started = True
            idx__table_start = i
        # ⚠ NEVER add "or (i == len(S)-1)" to the condition below    
        elif (not is_table_line and table_has_started):
            table_has_started = False
            idx__table_end = i
            table_indexes.append(idx__table_start)
            table_indexes.append(idx__table_end)

    return table_indexes



def simple_stylistic_replacements(S, type=None):


    '''
    For simple stylistic replacements. Includes conversions of:
    - Bold font
    - Highlighted font
    
    '''

    if type == ID__STYLE__BOLD:
        style_char = '\*\*'
        replacement_func = lambda repl, string:  repl.append(['**'+string+'**', '\\textbf{' + string + '}'])
        l = 2
    
    elif type == ID__STYLE__HIGHLIGHTER:
        style_char = '\=\='
        replacement_func = lambda repl, string:  repl.append(['=='+string+'==', '\hl{' + string + '}'])
        l = 2
    else:
        raise Exception('NOTHING CODED HERE!')

    S1 = []
    for s in S:
        occurences = [x.start() for x in re.finditer(style_char, s)]
        L = len(occurences)

        if L % l == 0:
            replacements = []
            for i in range(int(L/l)):
                o0 = occurences[l*i]
                o1 = occurences[l*i+1]
                replacement_func(replacements, s[o0+l:o1])
                
            for R in replacements:
                s = s.replace(R[0], R[1])
        else:
            raise Exception("error for this case, for now")
        
        S1.append(s)
    
    return S1

 
def convert__tables(S):
    '''
    Converts tables depending on the user's preferences    
    '''

    TABLE_SETTINGS = PARS['⚙']['TABLES']
    package = TABLE_SETTINGS['package']
    add_txt = ''
    if (ID__TABLES__alignment__center in TABLE_SETTINGS['alignment']) \
        and package == ID__TABLES__PACKAGE__longtblr:
        add_txt = '\centering '


    # After having found the table
    ## We expect that the 1st line defines the columns

    cols = S[0].split('|')
    cols = [[x.lstrip().rstrip() for x in cols if len(x)>0 and x!='\n']]

    data = []
    for s in S[2:]:
        c = s.split('|')
        c = [x.lstrip().rstrip() for x in c if len(x.lstrip().rstrip())>0 and x!='\n']
        data.append(c)

    y = cols + data

    # CONVERT
    N_cols = len(cols[0])

    latex_table = []
    addText = ''
    for i, c in enum(y):
        c1 = [add_txt + x for x in c]
        if i==0: 
            if TABLE_SETTINGS['any-hlines-at-all']:
                addText = ' \hline'
        else:
            if TABLE_SETTINGS['hlines-to-all-rows']:
                addText = ' \hline'
        latex_table.append('    ' + " & ".join(c1) + ' \\\\' + addText)

    lbefore = []


    if package == ID__TABLES__PACKAGE__tabularx:


        PCKG_NAME = '{tabularx}'

        if ID__TABLES__alignment__center in TABLE_SETTINGS['alignment']:
            lbefore.append(CMD__TABLE__TABULARX__CENTERING)
            colPrefix = 'Y'
        else:
            colPrefix = 'X'

        if (ID__TABLES__alignment__middle in TABLE_SETTINGS['alignment']):
            lbefore.append('\\renewcommand\\tabularxcolumn[1]{m{#1}}')

        latex_before_table = lbefore + [
            '\\begin{center}',
            '\\begin'+PCKG_NAME+'{\\textwidth}{' + '|' + N_cols*(colPrefix+'|') + '}',
            '   \hline'
        ]

        latex_after_table = [
            '   \hline',
            '\end'+PCKG_NAME,
            '\end{center}'
        ]

        LATEX = latex_before_table + latex_table + latex_after_table

    elif package == ID__TABLES__PACKAGE__longtblr:

        PCKG_NAME = '{longtblr}'

        latex_before_table = [
            '\\begin{center}',
            '\\begin' + PCKG_NAME + '[',
            'caption = {},',
            'entry = {},',
            'label = {},',
            'note{a} = {},',
            'note{$\dag$} = {}]',
            '   {colspec = {'+ N_cols*'X' +'}, width = ' + str(TABLE_SETTINGS['rel-width']) + '\linewidth, hlines, rowhead = 2, rowfoot = 1}'
            ]  

        latex_after_table = [
            '\end' + PCKG_NAME,
            '\end{center}'
        ]

        add_hline_at_end = False # to be moved to user settings
        if add_hline_at_end:
            latex_after_table = '   \hline' + latex_after_table


        LATEX = latex_before_table + latex_table + latex_after_table


    elif package == ID__TABLES__PACKAGE__long_table:
        PCKG_NAME = '{longtable}'

        latex_before_table=[
        	'\\begin{center}',
		    '   \\begin{longtable}{' + N_cols*'c' + '}',
			'   \caption{} \\\\',
			'   \hline',
			'   '+latex_table[0],
			'   \hline',
			'   \endfirsthead % Use \endfirsthead for the line after the first header',
			'   \hline',
			'   \endfoot',
            ]

        latex_after_table = [
            '   \end' + PCKG_NAME,
            '\end{center}'
        ]

        LATEX = latex_before_table + ['    '+x for x in latex_table[1:]] + latex_after_table
    else:
        raise Exception('NOTHING CODED HERE!')
    return LATEX


def images_converter(images):

    '''
    Converts Images given the path of the image file
    '''

    # NOTES:
    # --- ", height=0.5\\textheight" addition causes the aspect ratio to break

    TO_PRINT = []

    for IM in images:
        path_img = '"' + IM[1].replace('\\', '/') + '"'
        label_img = IM[1].split('\\')[-1]
        caption_short = 'Caption short'
        caption_long = 'Caption long'

        TO_PRINT.append(' \n'.join([
        '\\begin{figure}',
        '	\centering',
        '	\includegraphics[width=0.7\linewidth]'+\
            '{"'+path_img+'"}',
        '	\caption['+caption_short+']{'+caption_long+'}',
        '	\label{fig:'+label_img+'}',
        '\end{figure}']))

    return TO_PRINT

def add_new_line_equations(S0):

    # This function assumes that the '\n' symbol hasn't been added yet
    S = S0
    for i, s in enum(S):

        if not s.endswith('$$') and s.endswith('$'):
            if i<len(S):
                S[i+1] = '\n' + S[i+1]

        if not s.startswith('$$') and s.startswith('$'):
            if i>0: 
                if not S[i-1].endswith('\n'):
                    S[i-1] = S[i-1] + '\n'*2
                else:
                    S[i-1] = S[i-1] + '\n'

        # if not s.endswith('$$') and s.endswith('$'):
        #     if i<len(S):

    return S


PATHS = PARS['📁']

with open(PATHS['markdown-file'], 'r', encoding='utf8') as f:
    content = f.readlines()


content = remove_markdown_comments(content)

# UNFOLD EMBEDDED NOTES ============================================================================================================================================
md__files_embedded_prev0 = []
md__files_embedded_prev = md__files_embedded_prev0.copy()
[content, md__files_embedded_new] = unfold_embedded_notes(content, md__files_embedded_prev, PARS)

while md__files_embedded_prev0 != md__files_embedded_new:
    md__files_embedded_prev0 = md__files_embedded_new.copy()
    md__files_embedded_prev = md__files_embedded_prev0.copy()
    [content, md__files_embedded_new] = unfold_embedded_notes(content, md__files_embedded_prev, PARS)

# ======================================================================================================================================================================

# Convert bullet and numbered lists.
content = bullet_list_converter(content)


# Replace headers and map sections \==================================================
Lc = len(content)-1
sections = []
for i in range(Lc+1):
    # ⚠ The sequence of replacements matters: 
    # ---- replace the lowest-level subsections first
    content_00 = content[i]

    content_0 = content[i]
    content[i] = re.sub(r'#### (.*)', r'\\paragraph{\1}', content[i].replace('%%', ''))
    if content[i] != content_0:
        sections.append([i, content_0.replace('#### ', '').replace('\n', '')])

    content_0 = content[i]
    content[i] = re.sub(r'### (.*)', r'\\subsubsection{\1}', content[i].replace('%%', ''))
    if content[i] != content_0:
        sections.append([i, content_0.replace('### ', '').replace('\n', '')])

    content_0 = content[i]
    content[i] = re.sub(r'## (.*)', r'\\subsection{\1}', content[i].replace('%%', ''))
    if content[i] != content_0:
        sections.append([i, content_0.replace('## ', '').replace('\n', '')])

    content_0 = content[i]
    content[i] = re.sub(r'# (.*)', r'\\section{\1}', content[i].replace('%%', ''))
    if content[i] != content_0:
        sections.append([i, content_0.replace('# ', '').replace('\n', '')])

# \==================================================\==================================================

# find reference blocks \==================================================
#---1. they have to be at the end of the sentence (i.e. before "\n")
blocks = []
for i in range(Lc+1):
    s = content[i].replace('\n', '')
    pattern = r"\^[\w\-]*$"
    link_label = re.findall(pattern, s)
    if len(link_label) > 0:
        blocks.append([i, link_label[0].replace('^', '')])    
# \==================================================

# Find and apply internal links
internal_links = internal_links__identifier(content)
content = internal_links__enforcer(content, [sections, blocks], internal_links)
#

# Convert figures \==================================================

embeded_refs = embedded_references_recognizer(content)

# ➕ add more image refs
# replace "content[line_number]" accordingly and see the result

for i, ln in enum(embeded_refs):

    line_number = ln[0]
    line_refs = ln[1]
    for lnrf in line_refs:

        # print(embedded_references_path_finder(lnrf[0]))
        converted_image_text = images_converter([[line_number, embedded_references_path_finder(lnrf[0], PARS)]])
        
        for img_txt_cnv in converted_image_text:
            tmp1 = '![[' + lnrf[0]
            if ('.png' in lnrf[0] or '.jpg' in lnrf[0]) and (lnrf[1].replace('|','')).isnumeric():
                content[line_number] = content[line_number].replace(tmp1 + lnrf[1] + ']]', img_txt_cnv)
            else:
                content[line_number] = content[line_number].replace(tmp1 + ']]', img_txt_cnv)


# \==================================================
content = add_new_line_equations(content)

IDX__TABLES = [0]
TYPE_OF_CNV = [ID__CNV__IDENTICAL]
tmp1 = identify__tables(content)
tmp2 = [ID__CNV__TABLE_STARTED for _ in tmp1]
tmp2[1::2] = [ID__CNV__IDENTICAL for _ in tmp1[1::2]]
IDX__TABLES += tmp1
TYPE_OF_CNV += tmp2

Lc = len(content)-1
if IDX__TABLES[-1] < Lc: 
    IDX__TABLES.append(Lc)
    TYPE_OF_CNV.append(ID__CNV__IDENTICAL)

LATEX_TABLES = []
for i in range(int(len(tmp1)/2)):
    LATEX_TABLES.append(convert__tables(content[tmp1[2*i]:tmp1[2*i+1]]))


# for i, L in enum(content):

#     for idx_table in IDX__TABLES:
#         LATEX_TABLES.append(convert__tables(content[idx_table[0]:idx_table[1]]))
content = symbol_replacement(content, PARS)   
content = simple_stylistic_replacements(content, type=ID__STYLE__BOLD)
content = simple_stylistic_replacements(content, type=ID__STYLE__HIGHLIGHTER)

if PARS['⚙']['EMBEDDED REFERENCES']['convert non embedded references']:
    content = non_embedded_references_converter(content)

LATEX = []
i0 = IDX__TABLES[0]
i_tables = 0
for j, i in enum(IDX__TABLES[1:]):
    if TYPE_OF_CNV[j] == ID__CNV__IDENTICAL:
        LATEX += content[i0:i]
    elif TYPE_OF_CNV[j] == ID__CNV__TABLE_STARTED:
        LATEX += LATEX_TABLES[i_tables]
        i_tables += 1
    
    i0 = i
    
LATEX = replace_hyperlinks(LATEX)

PREAMBLE = ['\documentclass{article}'] + package_loader() + ['\n'] + ['\sethlcolor{yellow}'] + ['\n'] + ['\n'*2] + ['\setcounter{secnumdepth}{4}'] + ['\\begin{document}']


LATEX = PREAMBLE + LATEX + ['\end{document}']
with open(PATHS['tex-file'], 'w', encoding='utf8') as f:
    for l in LATEX:
        if not l.endswith('\n'): l+='\n'
        f.write(l)



# Debugginng

In [87]:
embedded_references_recognizer(['This is ![[Pasted image 20221127213454.png|500]]'])

[[0, [('Pasted image 20221127213454.png', '|500')]]]

## LAB

In [81]:
all_chars = '\w' + SPECIAL_CHARACTERS + '\[\]'

pattern = r"\[([^\]]+)\]"
regexMdLinks = '/\[([^\[]+)\](\(.*\))'
s = '[some example]' 
s = '[Could not install packages due to an OSError: WinError 5 Access is denied](https://stackoverflow.com/questions/73339138/could-not-install-packages-due-to-an-oserror-winerror-5-access-is-denied)' 
match = re.findall(regexMdLinks, s)
match


# pattern = r"\[([^\]]+)\]\(([^\)]+)\)"

# pattern = r"\[([^\]]+)\]\(([^\)]+)\)"
# pattern = r"\[([^\[\]]+)\]\(([^\(\)]+)\)"
# pattern = r"\[([^\[\]]+)\]\(([^\(\)]+)\)"


# s = 'example with [linking a website](https://stackoverflow.com/questions/73339138/could-not-install-packages-due-to-an-oserror-winerror-5-access-is-denied)' 

# match = re.findall(pattern, s)
# match

# import re

# pattern = r"\[([^\[\]]+)\]\(([^\(\)]+)\)"

# text = r"[some sentence with [brackets] or (parentheses) inside it](some website)"

# match = re.findall(pattern, text)
# match






[]

In [82]:
# Anything that isn't a square closing bracket
name_regex = "[^]]+"
# http:// or https:// followed by anything but a closing paren
url_regex = "http[s]?://[^)]+"
text = '[Could not install packages due to an OSError: WinError 5 Access is denied](https://stackoverflow.com/questions/73339138/could-not-install-packages-due-to-an-oserror-winerror-5-access-is-denied)' 

markup_regex = '\[({0})]\(\s*({1})\s*\)'.format(name_regex, url_regex)


for match in re.findall(markup_regex, text):
    print(match)
    markdown_link = '[' + match[0] + '](' + match[1] + ')'
    latex_link = "\\href{" + match[1] + "}{" + match[0] + "}"
    print(text.replace(markdown_link, latex_link))


('Could not install packages due to an OSError: WinError 5 Access is denied', 'https://stackoverflow.com/questions/73339138/could-not-install-packages-due-to-an-oserror-winerror-5-access-is-denied')
\href{https://stackoverflow.com/questions/73339138/could-not-install-packages-due-to-an-oserror-winerror-5-access-is-denied}{Could not install packages due to an OSError: WinError 5 Access is denied}


In [32]:
def remove_markdown_comments(S):
    result = []
    in_comment = False
    for line in S:
        comment_start = line.find("%%")
        while comment_start != -1:
            comment_end = line.find("%%", comment_start + 2)
            if comment_end == -1:
                line = line[:comment_start]
                in_comment = True
                break
            else:
                line = line[:comment_start] + line[comment_end + 2:]
                comment_start = line.find("%%")
        if in_comment:
            comment_end = line.find("%%")
            if comment_end != -1:
                line = line[comment_end + 2:]
                in_comment = False
            else:
                line = ""
        result.append(line)
    return result


S = [
    'First line is gooooood',
    'Second line has %%comment%%',
    'Third line has %% two %% comments %% yall%%',
    'Fourth line has %% starting comment',
    'which %% ends in fifth %% but starts again %%'
]

# ['First line is gooooood', 'Second line has ', 'Third line has  comments ', 'Fourth line has %% starting comment', ' ends in fifth ']
remove_markdown_comments(S)

['First line is gooooood',
 'Second line has ',
 'Third line has  comments ',
 '',
 '']

# Notes

## Internal links/crossrefs

Using this format:

\section{Hello World}
\label{sec:hello}


\hyperref[sec:hello]{Word of text}


### Strategy
1. Add the label with the same name as in the Obsidian note. Add it just using "\n \label{sec:label}" instead of creating a new line
2. Map the sections and blocks so that we can correspond them easily



## Limitations

### Hyperlinks
- The pattern does not take account for the cases wherein there's more brackets inside the brackets


### Cannot understand Windows emojis

--> Use [this list of symbols](https://milde.users.sourceforge.net/LUCR/Math/mathpackages/amssymb-symbols.pdf) instead and the `\usepackage{amssymb}` command



## Programming mistakes/weaknesses in the code
1. Redundant replacement in: "⚠WARNING--1" (search for it)