Fixed '[Errno 36] File name too long' issue making it impossible to save comment scrapes with long titles. #19

LukeDSchenk · 2020-10-06T04:42:33Z

Overview

Summary

Added _check_len() function to the Export.NameFile() class to ensure generated filenames are not too long (and thus causing an error when trying to write scrapes to files). Added _check_len() call to Subreddit, Redditor, and comments scraping. These changes should prevent scrapes from failing due to overly long generated filenames.

Motivation/Context

When trying to use the comment scrape option, scrapes are automatically written to a file which includes the title of the comment thread in the filename. In a case where a thread title is rather long (140 chars+) it is possible that the overly long filename will cause an error and the scrape will fail to write to the designated file. In a nutshell, when scraping comment threads with long titles you would sit and wait for it to finish only to find out that your data was lost due to a bad filename :).

New Dependencies

None

Issue Fix or Enhancement Request

Not applicable

Type of Change

Bug Fix (non-breaking change which fixes an issue)

Breaking Change

Not applicable (I have included some scrape logs for reference anyways)

List All Changes That Have Been Made

Created _check_len() function to the Export.NameFile() class
- Checks the length of a raw filename and shortens it if need be
Added _check_len() call to all scrape types right before incorrect char validation

How Has This Been Tested?

Target
- Attempted to run a comments scrape with limit set to 0:
  *Comment url was for a post with a lengthy title (> 140 chars).
  + Ran python3 Urs.py -c https://www.reddit.com/r/AskReddit/comments/j5jb71/how_do_you_deal_with_an_overly_friendly_neighbor/ 0 --json.
  * output: python [2020-10-05 22:49:19,876] [CRITICAL]: AN ERROR HAS OCCURED WHILE EXPORTING SCRAPED DATA. [2020-10-05 22:49:19,877] [CRITICAL]: [Errno 36] File name too long: '../scrapes/10-05-2020/c-How do you deal with an overly friendly neighbor who asks too many questions about your life when you happen to be outdoors at the same time_-RAW.json'

Test Configuration

Python version: 3.8.2
Running on Linux Mint 20 Ulyana
- Kernel version: 5.4.0-48-generic
- x86_64

Dependencies

astroid==2.4.1

attrs==19.3.0

certifi==2020.4.5.1

chardet==3.0.4

colorama==0.4.3

coverage==5.1

idna==2.9

isort==4.3.21

lazy-object-proxy==1.4.3

mccabe==0.6.1

more-itertools==8.3.0

packaging==20.4

pluggy==0.13.1

praw==7.0.0

prawcore==1.3.0

prettytable==0.7.2

py==1.8.1

pylint==2.5.2

pyparsing==2.4.7

pytest==5.4.3

pytest-cov==2.10.0

requests==2.23.0

six==1.14.0

toml==0.10.0

update-checker==0.17

urllib3==1.25.9

wcwidth==0.2.4

websocket-client==0.57.0

wrapt==1.12.1

Checklist

Tip: You can check off items by writing an "x" in the brackets, e.g. [x].

My code follows the style guidelines of this project.
I have performed a self-review of my own code, including testing to ensure my fix is effective or that my feature works.
My changes generate no new warnings.
I have commented my code, providing a summary of the functionality of each method, particularly in areas that may be hard to understand.
I have made corresponding changes to the documentation.
I have performed a self-review of this Pull Request template, ensuring the Markdown file renders correctly.

Added _check_len() function to the NameFile() class to ensure generated filenames are not too long (and thus causing an error when trying to write scrapes to files). Added _check_len() call to Subreddit, Redditor, and comments scraping.

codecov-commenter · 2020-10-06T04:43:45Z

Codecov Report

Merging #19 into master will increase coverage by 0.04%.
The diff coverage is 87.50%.

@@            Coverage Diff             @@
##           master      #19      +/-   ##
==========================================
+ Coverage   73.07%   73.11%   +0.04%     
==========================================
  Files          25       25              
  Lines        1998     2005       +7     
==========================================
+ Hits         1460     1466       +6     
- Misses        538      539       +1

Impacted Files	Coverage Δ
urs/utils/Export.py	`94.73% <87.50%> (-0.92%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6b0a80b...1f7540b. Read the comment docs.

LukeDSchenk · 2020-10-06T04:44:46Z

🌚🥴🌝

Fixed '[Errno 36] File name too long' issue making it impossible to save comment scrapes with long titles.

Update Export.py

1f7540b

Added _check_len() function to the NameFile() class to ensure generated filenames are not too long (and thus causing an error when trying to write scrapes to files). Added _check_len() call to Subreddit, Redditor, and comments scraping.

JosephLai241 merged commit ee8c041 into JosephLai241:master Oct 7, 2020

skiwheelr pushed a commit to skiwheelr/URS that referenced this pull request Feb 25, 2021

Merge pull request JosephLai241#19 from LukeDSchenk/master

750dbb7

Fixed '[Errno 36] File name too long' issue making it impossible to save comment scrapes with long titles.

JosephLai241 added the bugfix Fixed a bug label Mar 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed '[Errno 36] File name too long' issue making it impossible to save comment scrapes with long titles. #19

Fixed '[Errno 36] File name too long' issue making it impossible to save comment scrapes with long titles. #19

LukeDSchenk commented Oct 6, 2020 •

edited by JosephLai241

Loading

codecov-commenter commented Oct 6, 2020 •

edited

Loading

LukeDSchenk commented Oct 6, 2020

Fixed '[Errno 36] File name too long' issue making it impossible to save comment scrapes with long titles. #19

Fixed '[Errno 36] File name too long' issue making it impossible to save comment scrapes with long titles. #19

Conversation

LukeDSchenk commented Oct 6, 2020 • edited by JosephLai241 Loading

Overview

Summary

Motivation/Context

New Dependencies

Issue Fix or Enhancement Request

Type of Change

Breaking Change

List All Changes That Have Been Made

How Has This Been Tested?

Test Configuration

Dependencies

Checklist

codecov-commenter commented Oct 6, 2020 • edited Loading

Codecov Report

LukeDSchenk commented Oct 6, 2020

LukeDSchenk commented Oct 6, 2020 •

edited by JosephLai241

Loading

codecov-commenter commented Oct 6, 2020 •

edited

Loading