Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed '[Errno 36] File name too long' issue making it impossible to save comment scrapes with long titles. #19

Merged
merged 1 commit into from
Oct 7, 2020

Conversation

LukeDSchenk
Copy link
Contributor

@LukeDSchenk LukeDSchenk commented Oct 6, 2020

Overview

Summary

Added _check_len() function to the Export.NameFile() class to ensure generated filenames are not too long (and thus causing an error when trying to write scrapes to files). Added _check_len() call to Subreddit, Redditor, and comments scraping. These changes should prevent scrapes from failing due to overly long generated filenames.

Motivation/Context

When trying to use the comment scrape option, scrapes are automatically written to a file which includes the title of the comment thread in the filename. In a case where a thread title is rather long (140 chars+) it is possible that the overly long filename will cause an error and the scrape will fail to write to the designated file. In a nutshell, when scraping comment threads with long titles you would sit and wait for it to finish only to find out that your data was lost due to a bad filename :).

New Dependencies

None

Issue Fix or Enhancement Request

Not applicable

Type of Change

  • Bug Fix (non-breaking change which fixes an issue)

Breaking Change

Not applicable (I have included some scrape logs for reference anyways)

List All Changes That Have Been Made

  • Created _check_len() function to the Export.NameFile() class
    • Checks the length of a raw filename and shortens it if need be
  • Added _check_len() call to all scrape types right before incorrect char validation

How Has This Been Tested?

  • Target
    • Attempted to run a comments scrape with limit set to 0:
      *Comment url was for a post with a lengthy title (> 140 chars).
      + Ran python3 Urs.py -c https://www.reddit.com/r/AskReddit/comments/j5jb71/how_do_you_deal_with_an_overly_friendly_neighbor/ 0 --json.
      * output: python [2020-10-05 22:49:19,876] [CRITICAL]: AN ERROR HAS OCCURED WHILE EXPORTING SCRAPED DATA. [2020-10-05 22:49:19,877] [CRITICAL]: [Errno 36] File name too long: '../scrapes/10-05-2020/c-How do you deal with an overly friendly neighbor who asks too many questions about your life when you happen to be outdoors at the same time_-RAW.json'

Test Configuration

  • Python version: 3.8.2

  • Running on Linux Mint 20 Ulyana

    • Kernel version: 5.4.0-48-generic
    • x86_64

Dependencies

astroid==2.4.1

attrs==19.3.0

certifi==2020.4.5.1

chardet==3.0.4

colorama==0.4.3

coverage==5.1

idna==2.9

isort==4.3.21

lazy-object-proxy==1.4.3

mccabe==0.6.1

more-itertools==8.3.0

packaging==20.4

pluggy==0.13.1

praw==7.0.0

prawcore==1.3.0

prettytable==0.7.2

py==1.8.1

pylint==2.5.2

pyparsing==2.4.7

pytest==5.4.3

pytest-cov==2.10.0

requests==2.23.0

six==1.14.0

toml==0.10.0

update-checker==0.17

urllib3==1.25.9

wcwidth==0.2.4

websocket-client==0.57.0

wrapt==1.12.1

Checklist

Tip: You can check off items by writing an "x" in the brackets, e.g. [x].

  • My code follows the style guidelines of this project.
  • I have performed a self-review of my own code, including testing to ensure my fix is effective or that my feature works.
  • My changes generate no new warnings.
  • I have commented my code, providing a summary of the functionality of each method, particularly in areas that may be hard to understand.
  • I have made corresponding changes to the documentation.
  • I have performed a self-review of this Pull Request template, ensuring the Markdown file renders correctly.

Added _check_len() function to the NameFile() class to ensure generated filenames are not too long (and thus causing an error when trying to write scrapes to files). Added _check_len() call to Subreddit, Redditor, and comments scraping.
@codecov-commenter
Copy link

codecov-commenter commented Oct 6, 2020

Codecov Report

Merging #19 into master will increase coverage by 0.04%.
The diff coverage is 87.50%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #19      +/-   ##
==========================================
+ Coverage   73.07%   73.11%   +0.04%     
==========================================
  Files          25       25              
  Lines        1998     2005       +7     
==========================================
+ Hits         1460     1466       +6     
- Misses        538      539       +1     
Impacted Files Coverage Δ
urs/utils/Export.py 94.73% <87.50%> (-0.92%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6b0a80b...1f7540b. Read the comment docs.

@LukeDSchenk
Copy link
Contributor Author

🌚🥴🌝

@JosephLai241 JosephLai241 merged commit ee8c041 into JosephLai241:master Oct 7, 2020
skiwheelr pushed a commit to skiwheelr/URS that referenced this pull request Feb 25, 2021
Fixed '[Errno 36] File name too long' issue making it impossible to save comment scrapes with long titles.
@JosephLai241 JosephLai241 added the bugfix Fixed a bug label Mar 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix Fixed a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants