Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new version makes Medium code blocks unreadable #272

Open
Smitty010 opened this issue Dec 10, 2023 · 7 comments
Open

new version makes Medium code blocks unreadable #272

Smitty010 opened this issue Dec 10, 2023 · 7 comments

Comments

@Smitty010
Copy link

It looks like there is a new version. I think there was an attempt to fix the issue with code blocks not rendering properly. However, I would say that the new solution is worse than what it used to do. Here's an example. Look at https://medium.com/mlearning-ai/powerhouse-in-your-pocket-how-tiny-llms-are-redefining-the-ai-landscape-fdf17718bc79.

Here's the first codeblock as it appears in the story
image

Here's how it appears in the markdown
image

I suspect that the point was to remove the surrounding "```" and let the markdown editor render it. Notice that you lost any comments.

The other bigger problem is that you lose whitespace. For example, from the same story,
image

gets turned into (here I've removed the block markers to let it render)
image

Without the css of the file, all of the lines start in column 0. Given the importance of white space in python, not very helpful.

I'm sure this can't be an easy problem to solve or it would have happened already. I preferred the old rendering as we didn't have all of the html tags.

@deathau
Copy link
Owner

deathau commented Dec 10, 2023

Hmm... There were some fixes to issues with rendering HTML in code blocks that looks like it's had some side effects. I'll look into it.

@deathau
Copy link
Owner

deathau commented Dec 10, 2023

It looks like Medium code blocks are just terrible... Even without syntax highlighting applied, they're wrapped in a <pre><span> (no <code> in sight) and use <br> for newlines (therefore negating the whole point of even using a <pre>). Also, everything has random bundle-generated class names. In short, it's just horrendous in terms of machine readability.
I couldn't follow the original example, because Medium wants me to pay, but I did find a simple example here: https://blog.medium.com/code-blocks-with-syntax-highlighting-53343df53c4f
in which this:
image
looks like this in the source:

<pre class="mt mu mv mw mx my mz ni bo nj ba bj"><span id="e759" class="nk ne ew mz b bf nl nm l nn nh" data-selectable-paragraph=""><span class="hljs-comment">// highlighted code is easier to read</span><br><span class="hljs-keyword">function</span> <span class="hljs-title.function">newCodeBlock</span>() {<br>  <span class="hljs-keyword">return</span> “jazzy!”;<br>}</span></pre>

I can see from the classes that it's using highlight.js, but I can't really find anything to help with this case.

@deathau deathau changed the title new version makes code blocks unreadable new version makes Medium code blocks unreadable Dec 10, 2023
@Smitty010
Copy link
Author

Smitty010 commented Dec 11, 2023 via email

@Smitty010
Copy link
Author

I took the example I gave above and simply pasted the entire article into an empty markdown file in obsidian. Here's what I got for the code block (raw text; note that there were no code block marks ``` around it.):

# THIS IS FOR THE BACKBONE ONLY  
with gr.Blocks(theme='ParityError/Interstellar') as demo:   
    #TITLE SECTION  
    with gr.Row():  
        with gr.Column(scale=12):  
            gr.HTML  #TITLE  
            gr.Markdown #TEXT and DESRIPTION  
        gr.Image #your image LOGO  
   # chat and parameters settings  
    with gr.Row():  
        with gr.Column(scale=4):  #CHATBOT AND USER INPUT  
            chatbot = gr.Chatbot #Chatbot box  
            with gr.Row(): # ROW for user input and button  
                with gr.Column(scale=14):  
                    msg = gr.Textbox  # User input  
                submitBtn = gr.Button #Submit button  
  
        with gr.Column(min_width=50,scale=1): #PARAMETERS SECTION  
                with gr.Tab(label="Parameter Setting"):  
                    gr.Markdown("# Parameters")  
                    top_p = gr.Slider  
                    temperature = gr.Slider  
                    max_length_tokens = gr.Slider  
                    rep_pen = gr.Slider  
  
                clear = gr.Button("🗑️ Clear All Messages", variant='secondary')  
      
    # HERE we have to create the fucntions to be called by the Push Buttons  
    def user(user_message, history):  
  
    def bot(history,t,p,m,r):  
  
    # Clicking the submitBtn will call the generation with Parameters in the slides  
    submitBtn.click #all parameters here  
    clear.click #actions to clear all  
  
#MAIN CALL SECTION      
demo.queue()  #required to yield the streams from the text generation  
demo.launch(inbrowser=True)

which renders as
image

Still a bit wonky (comments sometimes render as headers or lose the #), but acceptable. Even better, I was also able to select it, type three backticks, and get the correct code block (with comments, indents, etc.). I then marked the block as Python and got all of the highlighting. I could live with having to mark the code blocks if the text was there.

I have no idea where all of the transformations of the html->text take place (in chrome, in obsidian paste). I'm not a browser/UI guy.

As I said, probably not an easy fix. Sorry

@deathau
Copy link
Owner

deathau commented Dec 11, 2023

Thanks for looking into this. From my understanding, Obsidian does the conversion on paste. It seems to be completely ignoring the <pre> tag and therefore not treating it as a code block at all (but it's still keeping the spacing — interesting)

This extension is a little bit different as it runs the document through a "readability" engine first, which strips out unnecessary stuff like headers and footers, but also strips some of the classes and styling before it converts to Markdown.

To that end, if you select all of the text inside of a code block in Medium, then right-click -> Markdownload -> Copy selection as markdown, it seems to behave similarly to Obsidian. But if you select the code block itself (like would happen if you also select text around it), it keeps all the HTML.

I'll look into it further (and if anyone else has any ideas, let me know)

@WayneDing
Copy link

Mark

@Smitty010
Copy link
Author

I generated the following templater template

<%*
const curFile = await app.workspace.activeLeaf.view.file;  // get current file
let contents = await app.vault.read(curFile) // get file contents

contents = contents
            .replaceAll(/<span.*?>/g, "")
            .replaceAll("</span>", "")
            .replaceAll("<br>", "\n")
            .replaceAll("<strong>", "**")
            . replaceAll("</strong>", "**")
            .replaceAll("<em>", "*")
            .replaceAll("</em>", "*")
            .replaceAll("&amp;", "%")
            .replaceAll("&lt;", "<")
            .replaceAll("&gt;", ">")

await app.vault.modify(curFile, contents); // *replace content with new content*
-%>

This seems to clean up a lot of the blocks. It obviously, has some limitations

  1. It's pretty stupid in that it doesn't really look for blocks but simply replaces some embedded html command with "appropriate" conversions. I wouldn't use it on an article about html.
  2. It doesn't help with comments that begin with a "#" character as they don't make it into the block at all
  3. I've only tried this on medium articles

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants