Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low memory error/crash #22

Closed
johnchoiniere opened this issue Aug 7, 2014 · 5 comments
Closed

Low memory error/crash #22

johnchoiniere opened this issue Aug 7, 2014 · 5 comments

Comments

@johnchoiniere
Copy link

Was building a pitchf/x db from scratch, using pitchrx. I repeatedly had system crashes from low memory. It was more frequent using Rstudio, but happened both when using Rstudio and when running the script from the command line. In RStudio it would make it roughly a year, cmd roughly two years before crashing.

Code I was running:

library(pitchRx)
library(dplyr)
files <- c("inning/inning_all.xml","inning/inning_hit.xml", "miniscoreboard.xml", "players.xml")
db <- src_mysql("pitchrx", host = NULL, port = [redacted], user = "root", password = "[redacted]")
scrape(start = "2008-01-01", end = "2008-12-31", suffix = files, connect = db$con)
scrape(start = "2009-01-01", end = "2009-12-31", suffix = files, connect = db$con)
scrape(start = "2010-01-01", end = "2010-12-31", suffix = files, connect = db$con)
scrape(start = "2011-01-01", end = "2011-12-31", suffix = files, connect = db$con)
scrape(start = "2012-01-01", end = "2012-12-31", suffix = files, connect = db$con)
scrape(start = "2013-01-01", end = "2013-12-31", suffix = files, connect = db$con)
scrape(start = "2014-01-01", end = Sys.Date()-1, suffix = files, connect = db$con)

I was able to work around the issue by querying for gameday_link, sorting so the most recent date was found, and deleting rows from all tables where that date was part of the link and then modifying the code to start at that date.

@cpsievert
Copy link
Owner

Thanks @johnchoiniere. I actually haven't used a mysql connection with scrape yet, but I have a feeling your issues were a consequence of your machine having insufficient memory to pull an entire year at once. I'm hoping to have a more elegant solution for memory management in future versions.

@johnchoiniere
Copy link
Author

Is there a way to clear any memory the script is using between years? Or is the solution just to run independent scripts for each year?

@cpsievert
Copy link
Owner

You could try using gc after scrape is done if you don't want to restart the session.

@johnchoiniere
Copy link
Author

Thanks!
On Aug 7, 2014 2:13 PM, "Carson" notifications@github.com wrote:

You could try using gc after scrape is done if you don't want to restart
the session.


Reply to this email directly or view it on GitHub
#22 (comment).

@cpsievert
Copy link
Owner

Closing since this is a duplicate of #27 (which has a more complete report of the issue).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants