Merged
Conversation
…y will also be included in the sphinx documentation later
…enerate html output
…ossible project structure)
jgoralcz
added a commit
that referenced
this pull request
Nov 27, 2018
* initial commit to development branch. * Downloads html from target page * Updated readme for dependencies as part of propper documentation. They will also be included in the sphinx documentation later * Cleaning up formatting of README * Added sphinx to project and created a bash file with the command to generate html output * for Salman * Finished documentation setup for Sphinx with autodoc example (using possible project structure) * Copied Zach's code * Updated version of Zach's https://github.com/zlmonroe/SER499/blob/solutions/Webcrawler/WebNavigator.py to find GitHub repo files * Added a unit test * Added names to corresponding URLs of files * Added BSD license * Made rationale in comments more clear * Changed list of list to list of tuples. Removed duplicate entries from list * Travis without examples and a readme.md (#3) * added base travis file and requirements text document. Need to test out and run a simple test. * travis with pytest. Will update if team confirms unittest over pytest. * simple test with travis to test with pytest. * fixed tests with pytest and asserted. You can run `pytest` from the command line to see it passes the tests. * fixed requirements.txt * add README, simple numpy unit testing, renamed files. * added reST style comments. * tensorflow example code updated. TensorFlow extends unittest, but pytest still runs it. Need to decide if team would like to use pytest for simple unit tests or go with unittest for all. * moved files to pytest folder to prepare for unittest. * modifying to work with OOP and unittest module. * converted test cases to unittest. * removed examples. Added a README.md * requires at least a placement test case until we have 1 test case. * Updated names. Fixed unit tests because output of getFileURLSFromGitHubRepo function were not guaranteed to be in the same order every time. * Initial file retrieval. Can get content of file, still need to create and store file * Separated WebNavigator and GitHubScraper into different classes. * Updated unit test to use GitHubScraper instead of WebNavigator * fixed comments * Modified function name in GitHubScraper to more accurately reflect its purpose * Added downloading capability. No filter yet * Added specific folder for files to download into * Made files download into folder named after repo * Appends to config.META * Prettier config.META * Moved tests to proper place * fixed test * Fixing test * Fixed test again * Added lxml requirement to travis * Addressed change request * More unit tests * Debugging issues on why tests don't pass on Travis but they do on my machine * Maybe this will make Travis happy? * Added some code to see if the directory constructed in Travis is different from the one on my machine * Modified tests so that tests for Windows and Linux run correctly * Sphinx (#5) * Updated readme for dependencies as part of propper documentation. They will also be included in the sphinx documentation later * Cleaning up formatting of README * Added sphinx to project and created a bash file with the command to generate html output * Finished documentation setup for Sphinx with autodoc example (using possible project structure) * Repo filtering (#9) * Started searching for repos * Got repo search to work for language, page number, and search string * Committing progress during layover: Added repo filtering functionality and created framework for creating folders. * Small scafolding update to RepoStrucutre * RepoStructure now creates things and is flexible if you need to not destroy a folder/file. RepoFilter spelling issues and python standards fixed. * RepoStructure now creates batches of repos in one go. * Added documentation * Removed generated files * Tried to make structure more consistent and hopefully fix JJ's import after, checking with Travis.... * attempting to fix travis: #1 * Fix for travis + better structure more standard to python applications * Travis environment (#10) * added base travis file and requirements text document. Need to test out and run a simple test. * travis with pytest. Will update if team confirms unittest over pytest. * simple test with travis to test with pytest. * fixed tests with pytest and asserted. You can run `pytest` from the command line to see it passes the tests. * fixed requirements.txt * add README, simple numpy unit testing, renamed files. * added reST style comments. * tensorflow example code updated. TensorFlow extends unittest, but pytest still runs it. Need to decide if team would like to use pytest for simple unit tests or go with unittest for all. * moved files to pytest folder to prepare for unittest. * modifying to work with OOP and unittest module. * converted test cases to unittest. * removed more travis examples. * added installation instructions in README.md for creating an `Anaconda` and `virtualenv` virtual environment for future capstone projects and ours. * changed versions back from >=1.12 to ==1.11 for tensorflow and tensorboard. * updated readme instead of `conda activate conda_decompy` it's now `source activaet conda_decompy` * updated readme and requirements.txt * added environments to gitignore. Fixed README.md and added setuptools version to a compatible version. * Us 58 identify download c files (#11) * Added timing measurements. Turns out that most of the time taken is to download web content. * Added some measurement code, concluded that there's no easy way to reduce download time * Removing hexagon folder. Discovered that unit tests are not passing * Removed code that tried to figure out Travis' file structure * Removed junk from not passing tests * added teardown to unit tests so test directory is always clean * Unit tests were overstringent; relaxed them so code would not be too contstrained. Added setup method to unit tests * Optimized scraping through parallel downloads. Current bottleneck is scraping through the pages. * Removing downloaded test folder * Committing before merging development into this branch * Simplified getFileURLSFromGitHubRepo function * Made some variables static in GitHubScraper in preparation for more multithreading * Scraping is multithreaded. * Multithreading finished. There is room for the "GitHubScraper.downloadAllFiles" itself to be threadable, but right now it is pretty good. * removed folders that weren't supposed to be there. * Identified newest bottleneck, its "getAbsoluteLinksFromPage" from WebNavigator. Will need to multithread/multiprocess to speed up significantly * Commit to be able to use another computer * Significantly improved time to download files from large repositories. * Verified that downloading two repositories consecutively does not cause issues * Fixed issue with \'NoneType\' urls * fixed makefile and generated docs * fixed makefile and generated docs * generated docs * Delete unnecessary doc files from testing doc build * added documention files * Repo filtering (#15) * Started searching for repos * Got repo search to work for language, page number, and search string * Committing progress during layover: Added repo filtering functionality and created framework for creating folders. * Small scafolding update to RepoStrucutre * RepoStructure now creates things and is flexible if you need to not destroy a folder/file. RepoFilter spelling issues and python standards fixed. * RepoStructure now creates batches of repos in one go. * Added documentation * Removed generated files * Tried to make structure more consistent and hopefully fix JJ's import after, checking with Travis.... * attempting to fix travis: #1 * Fix for travis + better structure more standard to python applications * created unit test for repo filtering / structure and fixed some errors with RepoStructure having to do with abs paths. Also fixed docs to not use @ as in @param test: * changed compare to set in RepoFilter since travis's folder order (likely locale issue) is different * Cherry picked JJ's threading version after accidentally removing in merge conflict * add index to point to correct file * move nojekyll file so gh pages finds _static folder * rebuilding * Added documentation for Repo Filtering design decisions and made the index system more user friendly. * Accidentally added swap file deleted * Us56 gen llvm (#13) * Beginning to create a program to auto compile to llvm * Trying to figure out why these commands don't work * Figured out how to do optimized and unoptmized code * Untested method for generating LLVM for all files in txt file. Does not consider extra bash args * can new save to specified location * Finished writing class to iterate thru cfiles and generate llvm. Examples inluded * Forgot to include example llvm * reorganized files * Created skeleton test case and test environment. * Having issues importing * testing some unittest stuff * Fixed a few test errors. Dealing with Python exports, which will be important for the whole team. * Fixed importing issues. Found problem with code. It does fails if the output folder does not exist * Location directory now created if does not exist. If compilation returns an error, GenLLVM now throws execption * Tests finished * Tests fully clean up * Final cleaning up. about to pull * Cleaned up file directory * Fixed to create and delete LLVM folder * added comments * Update * Fixed tests to work with pytest-3 * attempt to add clang to travis * add requirements to travis using before install directive * Made requirements less specific for clang compiler for travis * travis already has clang. It is not the reason this branch is not running. * made a fix for the tests * added ability to customize optimizations * Adding LLVM Gen documentation * Building HTML * Began documenting disassemblers. Fixed small window in sphinx html and added tab windows to sphinx for my page. * Pushing generated html so website won't be broken * Added more dissasembler information * Filter compile clang (#20) * Updated readme for dependencies as part of propper documentation. They will also be included in the sphinx documentation later * Cleaning up formatting of README * Added sphinx to project and created a bash file with the command to generate html output * Finished documentation setup for Sphinx with autodoc example (using possible project structure) * added comments for planning. * Started searching for repos * Got repo search to work for language, page number, and search string * initial setup for filtering a C File. Going to add OOP and static methods most likely * updated to check headers. * Committing progress during layover: Added repo filtering functionality and created framework for creating folders. * Small scafolding update to RepoStrucutre * beginning test cases. * added test files and validated tests * RepoStructure now creates things and is flexible if you need to not destroy a folder/file. RepoFilter spelling issues and python standards fixed. * RepoStructure now creates batches of repos in one go. * Added documentation * remove print statements. * Removed generated files * Tried to make structure more consistent and hopefully fix JJ's import after, checking with Travis.... * attempting to fix travis: #1 * Fix for travis + better structure more standard to python applications * updated tests and updated max_bytes to max_bytes and min_bytes * allowed c file to read in a folder and recursively check the data. * added option to create 'filtered' folder if one did not exist. * moved filter location so it creates the filter folder regardless. * appended file paths to a file instead of moving them. * created unit test for repo filtering / structure and fixed some errors with RepoStructure having to do with abs paths. Also fixed docs to not use @ as in @param test: * fixed makefile and generated docs * changed compare to set in RepoFilter since travis's folder order (likely locale issue) is different * fixed makefile and generated docs * generated docs * Delete unnecessary doc files from testing doc build * Cherry picked JJ's threading version after accidentally removing in merge conflict * added documention files * add index to point to correct file * move nojekyll file so gh pages finds _static folder * rebuilding * Added documentation for Repo Filtering design decisions and made the index system more user friendly. * added sphinx documentation * merged US-59-Filter-C-File and added database design decisions and filter design decisions. * removed src/ changed filtercfiles to filter and add a filter_list.META in the repo directory when running. * tests to see if a file can compile using clang. Changed names on a few classes for consistency. * Fix to download filter * Chose a dissasembeler, added docs with design decisions, provided runner up * Sqlite (#21) * Table Creation in SQLite example * ML and meta tables * Most updated sqlite ML and Meta tables * Insert data into the ML and meta tables * Updated the ML table * Delete tables added to ML * refactored into one and small adjustments. Added transaction and create database if it does not exist. * moved sql_transaction into the object. Need to thoroughly test it out. * database design complete. Refactored and OOP. Need to test and document. * beginning test cases. * test cases finished. Pagination is done. Need feedback. * added documentation and reasoning for the database. * test cases now pass. Allowed an override for the transaction builder to execute immediately instead of after 1000 if set to True * Added video to comppilation docs. Begun rewriting compilation docs * building html * building html * removed unnecessary html * add file that wasnt addedbefore * rebuild html * Update 3_clangSubproc.html * Us 58 identify download c files (#23) * Added timing measurements. Turns out that most of the time taken is to download web content. * Added some measurement code, concluded that there's no easy way to reduce download time * Removing hexagon folder. Discovered that unit tests are not passing * Removed code that tried to figure out Travis' file structure * Removed junk from not passing tests * added teardown to unit tests so test directory is always clean * Unit tests were overstringent; relaxed them so code would not be too contstrained. Added setup method to unit tests * Optimized scraping through parallel downloads. Current bottleneck is scraping through the pages. * Removing downloaded test folder * Committing before merging development into this branch * Simplified getFileURLSFromGitHubRepo function * Made some variables static in GitHubScraper in preparation for more multithreading * Scraping is multithreaded. * Multithreading finished. There is room for the "GitHubScraper.downloadAllFiles" itself to be threadable, but right now it is pretty good. * removed folders that weren't supposed to be there. * Identified newest bottleneck, its "getAbsoluteLinksFromPage" from WebNavigator. Will need to multithread/multiprocess to speed up significantly * Commit to be able to use another computer * Significantly improved time to download files from large repositories. * Verified that downloading two repositories consecutively does not cause issues * Fixed issue with \'NoneType\' urls * Clean directory after test on Linux systems * Refactored code so multithreading uses 8 threads at most. Still bottlenecked by something when scraping through large repositories * Changed some lists to sets to improve performance. Still being bottlenecked somewhere * Removed rate limiting * Added ability to target download directory. Updated unit tests. They should fail right now * Updated unit tests to delete leftover folders * Reformated unit test to meet common formatting standards * Reformated unit test to meet more formatting standards * Improved performance when scraping through repositories with directory structures that resemble a linked list * Completed task 152 * Fixed download not-C files. For real this time * Modified META timestamp to YYYY-MM-DD HH:MM:SS format * Initial work on Task 187 * Progress on multithreading * Making progress on multithreading * Making progress on multithreading * Network IO performance is maximized. Current bottleneck if file IO to storage * Multithreading is mostly done. Also fixed an issue when writing files with weird encodings * Removed depracated functions * Cleaned up new scraper * Fixed unit tests * fixed some stuff * Removed print statements, fixed tests * Created documentation * Removed depracated GitHubScraper * fixes for html * additional commits.
jgoralcz
added a commit
that referenced
this pull request
Dec 2, 2018
* initial commit to development branch. * Downloads html from target page * Updated readme for dependencies as part of propper documentation. They will also be included in the sphinx documentation later * Cleaning up formatting of README * Added sphinx to project and created a bash file with the command to generate html output * for Salman * Finished documentation setup for Sphinx with autodoc example (using possible project structure) * Copied Zach's code * Updated version of Zach's https://github.com/zlmonroe/SER499/blob/solutions/Webcrawler/WebNavigator.py to find GitHub repo files * Added a unit test * Added names to corresponding URLs of files * Added BSD license * Made rationale in comments more clear * Changed list of list to list of tuples. Removed duplicate entries from list * Travis without examples and a readme.md (#3) * added base travis file and requirements text document. Need to test out and run a simple test. * travis with pytest. Will update if team confirms unittest over pytest. * simple test with travis to test with pytest. * fixed tests with pytest and asserted. You can run `pytest` from the command line to see it passes the tests. * fixed requirements.txt * add README, simple numpy unit testing, renamed files. * added reST style comments. * tensorflow example code updated. TensorFlow extends unittest, but pytest still runs it. Need to decide if team would like to use pytest for simple unit tests or go with unittest for all. * moved files to pytest folder to prepare for unittest. * modifying to work with OOP and unittest module. * converted test cases to unittest. * removed examples. Added a README.md * requires at least a placement test case until we have 1 test case. * Updated names. Fixed unit tests because output of getFileURLSFromGitHubRepo function were not guaranteed to be in the same order every time. * Initial file retrieval. Can get content of file, still need to create and store file * Separated WebNavigator and GitHubScraper into different classes. * Updated unit test to use GitHubScraper instead of WebNavigator * fixed comments * Modified function name in GitHubScraper to more accurately reflect its purpose * Added downloading capability. No filter yet * Added specific folder for files to download into * Made files download into folder named after repo * Appends to config.META * Prettier config.META * Moved tests to proper place * fixed test * Fixing test * Fixed test again * Added lxml requirement to travis * Addressed change request * More unit tests * Debugging issues on why tests don't pass on Travis but they do on my machine * Maybe this will make Travis happy? * Added some code to see if the directory constructed in Travis is different from the one on my machine * Modified tests so that tests for Windows and Linux run correctly * Sphinx (#5) * Updated readme for dependencies as part of propper documentation. They will also be included in the sphinx documentation later * Cleaning up formatting of README * Added sphinx to project and created a bash file with the command to generate html output * Finished documentation setup for Sphinx with autodoc example (using possible project structure) * Repo filtering (#9) * Started searching for repos * Got repo search to work for language, page number, and search string * Committing progress during layover: Added repo filtering functionality and created framework for creating folders. * Small scafolding update to RepoStrucutre * RepoStructure now creates things and is flexible if you need to not destroy a folder/file. RepoFilter spelling issues and python standards fixed. * RepoStructure now creates batches of repos in one go. * Added documentation * Removed generated files * Tried to make structure more consistent and hopefully fix JJ's import after, checking with Travis.... * attempting to fix travis: #1 * Fix for travis + better structure more standard to python applications * Travis environment (#10) * added base travis file and requirements text document. Need to test out and run a simple test. * travis with pytest. Will update if team confirms unittest over pytest. * simple test with travis to test with pytest. * fixed tests with pytest and asserted. You can run `pytest` from the command line to see it passes the tests. * fixed requirements.txt * add README, simple numpy unit testing, renamed files. * added reST style comments. * tensorflow example code updated. TensorFlow extends unittest, but pytest still runs it. Need to decide if team would like to use pytest for simple unit tests or go with unittest for all. * moved files to pytest folder to prepare for unittest. * modifying to work with OOP and unittest module. * converted test cases to unittest. * removed more travis examples. * added installation instructions in README.md for creating an `Anaconda` and `virtualenv` virtual environment for future capstone projects and ours. * changed versions back from >=1.12 to ==1.11 for tensorflow and tensorboard. * updated readme instead of `conda activate conda_decompy` it's now `source activaet conda_decompy` * updated readme and requirements.txt * added environments to gitignore. Fixed README.md and added setuptools version to a compatible version. * Us 58 identify download c files (#11) * Added timing measurements. Turns out that most of the time taken is to download web content. * Added some measurement code, concluded that there's no easy way to reduce download time * Removing hexagon folder. Discovered that unit tests are not passing * Removed code that tried to figure out Travis' file structure * Removed junk from not passing tests * added teardown to unit tests so test directory is always clean * Unit tests were overstringent; relaxed them so code would not be too contstrained. Added setup method to unit tests * Optimized scraping through parallel downloads. Current bottleneck is scraping through the pages. * Removing downloaded test folder * Committing before merging development into this branch * Simplified getFileURLSFromGitHubRepo function * Made some variables static in GitHubScraper in preparation for more multithreading * Scraping is multithreaded. * Multithreading finished. There is room for the "GitHubScraper.downloadAllFiles" itself to be threadable, but right now it is pretty good. * removed folders that weren't supposed to be there. * Identified newest bottleneck, its "getAbsoluteLinksFromPage" from WebNavigator. Will need to multithread/multiprocess to speed up significantly * Commit to be able to use another computer * Significantly improved time to download files from large repositories. * Verified that downloading two repositories consecutively does not cause issues * Fixed issue with \'NoneType\' urls * fixed makefile and generated docs * fixed makefile and generated docs * generated docs * Delete unnecessary doc files from testing doc build * added documention files * Repo filtering (#15) * Started searching for repos * Got repo search to work for language, page number, and search string * Committing progress during layover: Added repo filtering functionality and created framework for creating folders. * Small scafolding update to RepoStrucutre * RepoStructure now creates things and is flexible if you need to not destroy a folder/file. RepoFilter spelling issues and python standards fixed. * RepoStructure now creates batches of repos in one go. * Added documentation * Removed generated files * Tried to make structure more consistent and hopefully fix JJ's import after, checking with Travis.... * attempting to fix travis: #1 * Fix for travis + better structure more standard to python applications * created unit test for repo filtering / structure and fixed some errors with RepoStructure having to do with abs paths. Also fixed docs to not use @ as in @param test: * changed compare to set in RepoFilter since travis's folder order (likely locale issue) is different * Cherry picked JJ's threading version after accidentally removing in merge conflict * add index to point to correct file * move nojekyll file so gh pages finds _static folder * rebuilding * Added documentation for Repo Filtering design decisions and made the index system more user friendly. * Accidentally added swap file deleted * Us56 gen llvm (#13) * Beginning to create a program to auto compile to llvm * Trying to figure out why these commands don't work * Figured out how to do optimized and unoptmized code * Untested method for generating LLVM for all files in txt file. Does not consider extra bash args * can new save to specified location * Finished writing class to iterate thru cfiles and generate llvm. Examples inluded * Forgot to include example llvm * reorganized files * Created skeleton test case and test environment. * Having issues importing * testing some unittest stuff * Fixed a few test errors. Dealing with Python exports, which will be important for the whole team. * Fixed importing issues. Found problem with code. It does fails if the output folder does not exist * Location directory now created if does not exist. If compilation returns an error, GenLLVM now throws execption * Tests finished * Tests fully clean up * Final cleaning up. about to pull * Cleaned up file directory * Fixed to create and delete LLVM folder * added comments * Update * Fixed tests to work with pytest-3 * attempt to add clang to travis * add requirements to travis using before install directive * Made requirements less specific for clang compiler for travis * travis already has clang. It is not the reason this branch is not running. * made a fix for the tests * added ability to customize optimizations * Adding LLVM Gen documentation * Building HTML * Began documenting disassemblers. Fixed small window in sphinx html and added tab windows to sphinx for my page. * Pushing generated html so website won't be broken * Added more dissasembler information * Filter compile clang (#20) * Updated readme for dependencies as part of propper documentation. They will also be included in the sphinx documentation later * Cleaning up formatting of README * Added sphinx to project and created a bash file with the command to generate html output * Finished documentation setup for Sphinx with autodoc example (using possible project structure) * added comments for planning. * Started searching for repos * Got repo search to work for language, page number, and search string * initial setup for filtering a C File. Going to add OOP and static methods most likely * updated to check headers. * Committing progress during layover: Added repo filtering functionality and created framework for creating folders. * Small scafolding update to RepoStrucutre * beginning test cases. * added test files and validated tests * RepoStructure now creates things and is flexible if you need to not destroy a folder/file. RepoFilter spelling issues and python standards fixed. * RepoStructure now creates batches of repos in one go. * Added documentation * remove print statements. * Removed generated files * Tried to make structure more consistent and hopefully fix JJ's import after, checking with Travis.... * attempting to fix travis: #1 * Fix for travis + better structure more standard to python applications * updated tests and updated max_bytes to max_bytes and min_bytes * allowed c file to read in a folder and recursively check the data. * added option to create 'filtered' folder if one did not exist. * moved filter location so it creates the filter folder regardless. * appended file paths to a file instead of moving them. * created unit test for repo filtering / structure and fixed some errors with RepoStructure having to do with abs paths. Also fixed docs to not use @ as in @param test: * fixed makefile and generated docs * changed compare to set in RepoFilter since travis's folder order (likely locale issue) is different * fixed makefile and generated docs * generated docs * Delete unnecessary doc files from testing doc build * Cherry picked JJ's threading version after accidentally removing in merge conflict * added documention files * add index to point to correct file * move nojekyll file so gh pages finds _static folder * rebuilding * Added documentation for Repo Filtering design decisions and made the index system more user friendly. * added sphinx documentation * merged US-59-Filter-C-File and added database design decisions and filter design decisions. * removed src/ changed filtercfiles to filter and add a filter_list.META in the repo directory when running. * tests to see if a file can compile using clang. Changed names on a few classes for consistency. * Fix to download filter * Chose a dissasembeler, added docs with design decisions, provided runner up * Sqlite (#21) * Table Creation in SQLite example * ML and meta tables * Most updated sqlite ML and Meta tables * Insert data into the ML and meta tables * Updated the ML table * Delete tables added to ML * refactored into one and small adjustments. Added transaction and create database if it does not exist. * moved sql_transaction into the object. Need to thoroughly test it out. * database design complete. Refactored and OOP. Need to test and document. * beginning test cases. * test cases finished. Pagination is done. Need feedback. * added documentation and reasoning for the database. * test cases now pass. Allowed an override for the transaction builder to execute immediately instead of after 1000 if set to True * Added video to comppilation docs. Begun rewriting compilation docs * building html * building html * removed unnecessary html * add file that wasnt addedbefore * rebuild html * Update 3_clangSubproc.html * Us 58 identify download c files (#23) * Added timing measurements. Turns out that most of the time taken is to download web content. * Added some measurement code, concluded that there's no easy way to reduce download time * Removing hexagon folder. Discovered that unit tests are not passing * Removed code that tried to figure out Travis' file structure * Removed junk from not passing tests * added teardown to unit tests so test directory is always clean * Unit tests were overstringent; relaxed them so code would not be too contstrained. Added setup method to unit tests * Optimized scraping through parallel downloads. Current bottleneck is scraping through the pages. * Removing downloaded test folder * Committing before merging development into this branch * Simplified getFileURLSFromGitHubRepo function * Made some variables static in GitHubScraper in preparation for more multithreading * Scraping is multithreaded. * Multithreading finished. There is room for the "GitHubScraper.downloadAllFiles" itself to be threadable, but right now it is pretty good. * removed folders that weren't supposed to be there. * Identified newest bottleneck, its "getAbsoluteLinksFromPage" from WebNavigator. Will need to multithread/multiprocess to speed up significantly * Commit to be able to use another computer * Significantly improved time to download files from large repositories. * Verified that downloading two repositories consecutively does not cause issues * Fixed issue with \'NoneType\' urls * Clean directory after test on Linux systems * Refactored code so multithreading uses 8 threads at most. Still bottlenecked by something when scraping through large repositories * Changed some lists to sets to improve performance. Still being bottlenecked somewhere * Removed rate limiting * Added ability to target download directory. Updated unit tests. They should fail right now * Updated unit tests to delete leftover folders * Reformated unit test to meet common formatting standards * Reformated unit test to meet more formatting standards * Improved performance when scraping through repositories with directory structures that resemble a linked list * Completed task 152 * Fixed download not-C files. For real this time * Modified META timestamp to YYYY-MM-DD HH:MM:SS format * Initial work on Task 187 * Progress on multithreading * Making progress on multithreading * Making progress on multithreading * Network IO performance is maximized. Current bottleneck if file IO to storage * Multithreading is mostly done. Also fixed an issue when writing files with weird encodings * Removed depracated functions * Cleaned up new scraper * Fixed unit tests * fixed some stuff * Removed print statements, fixed tests * Created documentation * Removed depracated GitHubScraper * fixes for html * additional commits.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Documentation framework ready for merge