Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: slug is empty if filename is non-ASCII #22118

Merged
merged 6 commits into from
Nov 16, 2022

Conversation

EugeneTorap
Copy link
Contributor

@EugeneTorap EugeneTorap commented Nov 14, 2022

If create a chart only with non-ASCII characters then we have _<id>.yaml filename and can't import this file because the filename starts with underscore.
This PR fix the issue just add a condition if chart_slug is empty then use only <id>.yaml filename.
Fixed the issue for database, dataset, chart and dashboard export.

SUMMARY

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@EugeneTorap EugeneTorap changed the title Fix chart_slug is empty if chart name is non-ASCII fix chart_slug is empty if chart name is non-ASCII Nov 14, 2022
@EugeneTorap EugeneTorap changed the title fix chart_slug is empty if chart name is non-ASCII fix: chart_slug is empty if chart name is non-ASCII Nov 14, 2022
@codecov
Copy link

codecov bot commented Nov 14, 2022

Codecov Report

Merging #22118 (653f818) into master (5b67e07) will decrease coverage by 1.57%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master   #22118      +/-   ##
==========================================
- Coverage   67.12%   65.54%   -1.58%     
==========================================
  Files        1831     1832       +1     
  Lines       69993    69837     -156     
  Branches     7570     7570              
==========================================
- Hits        46983    45775    -1208     
- Misses      21045    22097    +1052     
  Partials     1965     1965              
Flag Coverage Δ
hive ?
mysql 78.15% <100.00%> (-0.25%) ⬇️
postgres 78.21% <100.00%> (-0.25%) ⬇️
presto ?
python 78.34% <100.00%> (-3.24%) ⬇️
sqlite 76.67% <100.00%> (-0.26%) ⬇️
unit ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/charts/commands/export.py 94.11% <100.00%> (ø)
superset/dashboards/commands/export.py 83.78% <100.00%> (ø)
superset/databases/commands/export.py 91.48% <100.00%> (ø)
superset/datasets/commands/export.py 79.59% <100.00%> (-8.17%) ⬇️
superset/utils/file.py 100.00% <100.00%> (ø)
superset/tables/schemas.py 0.00% <0.00%> (-100.00%) ⬇️
superset/columns/schemas.py 0.00% <0.00%> (-100.00%) ⬇️
superset/db_engines/hive.py 0.00% <0.00%> (-85.19%) ⬇️
...set/advanced_data_type/plugins/internet_address.py 16.32% <0.00%> (-79.60%) ⬇️
superset/utils/pandas_postprocessing/boxplot.py 20.51% <0.00%> (-79.49%) ⬇️
... and 67 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@EugeneTorap EugeneTorap changed the title fix: chart_slug is empty if chart name is non-ASCII fix: slug is empty if filename is non-ASCII Nov 14, 2022
Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! And thanks for the comprehensive tests, great work! LGTM

@betodealmeida betodealmeida merged commit 394fb2f into apache:master Nov 16, 2022
@EugeneTorap EugeneTorap deleted the fix/chart-slug-is-empty branch November 16, 2022 18:56
@artemonsh
Copy link
Contributor

@EugeneTorap @villebro
Could you please take a look at this issue: #21657?

There is a quite annoying problem with cyrillic languages like Russian. In short, dashboards, charts and databases containing only cyrillic letters cannot be imported into Superset! The function Eugene provided, for instance, converts chart title "Мой график" ("My chart") into an empty string in python, leading to this type of filename: _123.yaml. And as the first character in the filename is underscore, the is_valid_config function returns False for this filename, which then prohibits to import the file, despite the notification in the bottom right corner saying that everything was imported successfully :/

Therefore, charts/dashboards/databases with these types of titles cannot not be imported into Superset. My suggestion is to expand werkzeug's secure_filename function as follows:

import unicodedata


def secure_filename(filename: str) -> str:
    r"""Pass it a filename and it will return a secure version of it.  This
    filename can then safely be stored on a regular file system and passed
    to :func:`os.path.join`.

    On windows systems the function also makes sure that the file is not
    named after one of the special device files.

    The function also takes filenames containing cyrillic letters.

    >>> secure_filename("My cool movie.mov")
    'My_cool_movie.mov'
    >>> secure_filename("../../../etc/passwd")
    'etc_passwd'
    >>> secure_filename('i contain cool \xfcml\xe4uts.txt')
    'i_contain_cool_umlauts.txt'
    >>> secure_filename('Мой красивый график.yaml')
    'Мой_красивыи_график.yaml'

    The function might return an empty filename.  It's your responsibility
    to ensure that the filename is unique and that you abort or
    generate a random filename if the function returned an empty one.

    .. versionadded:: 0.5

    :param filename: the filename to secure
    """
    # If the text contains cyrillic letters, ASCII encoding should not
    # be used as it does not contain cyrillic letters
    contains_cyrillic_letters = bool(re.search("[\u0400-\u04FF]", filename))

    _windows_device_files = (
        "CON",
        "AUX",
        "COM1",
        "COM2",
        "COM3",
        "COM4",
        "LPT1",
        "LPT2",
        "LPT3",
        "PRN",
        "NUL",
    )

    _filename_ascii_strip_re = re.compile(r"[^A-Za-z0-9_.-]")
    _filename_strip_re = (
        re.compile(r"[^A-Za-zа-яА-ЯёЁ0-9_.-]")
        if contains_cyrillic_letters
        else _filename_ascii_strip_re
    )

    filename = unicodedata.normalize("NFKD", filename)
    if not contains_cyrillic_letters:
        filename = filename.encode("ascii", "ignore").decode("ascii")

    for sep in os.path.sep, os.path.altsep:
        if sep:
            filename = filename.replace(sep, " ")
    filename = str(_filename_strip_re.sub("", "_".join(filename.split()))).strip("._")

    # on nt a couple of special files are present in each folder.  We
    # have to ensure that the target file is not such a filename.  In
    # this case we prepend an underline
    if (
        os.name == "nt"
        and filename
        and filename.split(".")[0].upper() in _windows_device_files
    ):
        filename = f"_{filename}"

    return filename

Before:

>>>print(secure_filename("Мой красивый график"))
>>>print(secure_filename("My beautiful график"))
>>>print(secure_filename("My beautiful chart"))

My_beautiful
My_beautiful_chart

After:

>>>print(secure_filename("Мой красивый график"))
>>>print(secure_filename("My beautiful график"))
>>>print(secure_filename("My beautiful chart"))
Мои_красивыи_график
My_beautiful_график
My_beautiful_chart

@EugeneTorap
Copy link
Contributor Author

Hi @artemonsh! My PR fix the issue with Cyrillic characters.
If no, pls text me in superset slack. I have the same nickname there.

@EugeneTorap
Copy link
Contributor Author

@artemonsh I've read you suggested code. You only added Cyrillic support without others utf-8 characters like: hieroglyphs, emojis and so on. My PR supports import/export for all utf-8 characters.

@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 2.1.0 and removed 🚢 2.1.3 labels Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/L 🚢 2.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants