Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check .shtml files for noindex directives #50

Merged
merged 3 commits into from
Jul 25, 2022

Conversation

TravisBrace
Copy link
Contributor

@TravisBrace TravisBrace commented Jul 20, 2022

Summary

Checks .shtml files for noindex directives in the head.

Hoping adding "shtml" would help treat index.shtml pages the same as index.html pages removing the from sitemap so pages end in /

Closing Issues

Closes #51

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Improvements to existing code, such as refactoring or optimizations (non-breaking)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • I have read the CONTRIBUTING document.
  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@cicirello
Copy link
Owner

That's part of it. That alone will just cause the action to search for noindex directives within shtml files, excluding those that have it. Without your change, it currently just looks for noindex in files with extension html or htm. The index.html is dropped from the url somewhere else.

@TravisBrace
Copy link
Contributor Author

Oh nice! that would definitely help out for some of the older .shtml sites we keep on github.

Is this the other part here? I could add a

elif f == "index.shtml" :
return ""

To the pull request also.

def sortname(f, dropExtension=False) :
    """Partial url to sort by, which strips out the filename
    if the filename is index.html.
    Keyword arguments:
    f - Filename with path
    dropExtension - true to drop extensions of .html from the filename when sorting
    """
    if len(f) >= 11 and f[-11:] == "/index.html" :
        return f[:-10]
    elif f == "index.html" :
        return ""
    elif dropExtension and len(f) >= 5 and f[-5:] == ".html" :
        return f[:-5]
    else :
        return f

@cicirello
Copy link
Owner

@TravisBrace Yes, the sortname function is where that is handled. Or at least now it is. While looking at code to confirm your question I noticed I had some redundancy in there. I just committed a change to your PR to remove the redundancy so that the other place with that logic now just calls the sortname function.

I think you should now be able to add the elif you mentioned. You might actually need two of them. You'll notice the first if checks if the file path ends with "/index.html" and the elif after that checks if the path equals "index.html" (no slash). If I remember correctly, the reason it's not simply checking if it ends with "index.html" is for the possibly unlikely case of a filename like "somethingindex.html".

@cicirello cicirello changed the title update generatesitemap.py for shtml Drop index.shtml from URLs and check .shtml files for noindex directives Jul 22, 2022
@cicirello cicirello added the bug Something isn't working label Jul 22, 2022
TravisBrace and others added 2 commits July 25, 2022 11:28
Hoping adding "shtml" would help treat index.shtml pages the same as index.html pages removing the from sitemap so pages end in /

tests shtml as html extension

Co-Authored-By: Travis Brace <TravisBrace@gmail.com>
Co-Authored-By: Vincent A. Cicirello <762030+cicirello@users.noreply.github.com>
@cicirello cicirello changed the title Drop index.shtml from URLs and check .shtml files for noindex directives Check .shtml files for noindex directives Jul 25, 2022
@cicirello
Copy link
Owner

@TravisBrace I'm going to merge this now since it handles checking head of .shtml files for noindex directives. I created an issue for the thing you really want, dropping index.shtml from URLs. Feel free to submit an additional PR for that one. It should just involve changing the sortname function.

@cicirello cicirello merged commit e27344a into cicirello:master Jul 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Check .shtml files for noindex directives
2 participants