Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disparity between book counts / total sizes #63

Open
codingthat opened this issue Apr 29, 2020 · 11 comments
Open

Disparity between book counts / total sizes #63

codingthat opened this issue Apr 29, 2020 · 11 comments

Comments

@codingthat
Copy link

codingthat commented Apr 29, 2020

Hi! Great project!

After getting dependencies installed, the download went without errors. But I'm wondering about the result...

https://link.springer.com/search?facet-content-type=%22Book%22&package=mat-covid19_textbooks&%23038;facet-language=%22En%22&%23038;sortOrder=newestFirst&%23038;showAll=true says there are 473 books, 407 of which are in English.

The readme says there are "409 english books (14 GB, both PDF and EPUB)"

My download directory says there are 409 objects, totaling 6.5 GB. Digging into directories shows both PDFs and EPUBs.

  1. Why is the size so much lower than in the readme?
  2. If there are 407 books, how does that become only 409 files, when most books seem to be represented by 2 files each (PDF and EPUB)? (I would have expected more like 814 files, unless only a handful of books were available in both formats.)
@codingthat
Copy link
Author

The list of books not downloaded appears to consist of these, but I'm not sure what makes them different from the others:

A Beginners Guide to Python 3 Programming
A Beginner's Guide to R
A Beginner's Guide to Scala, Object Orientation and Functional Programming
Abstract Algebra
A Concise Guide to Market Research
A Course in Rasch Measurement Theory
Advanced Guide to Python 3 Programming
Advanced Organic Chemistry
Advanced Organic Chemistry
Advanced Quantum Mechanics
A First Introduction to Quantum Physics
A Modern Introduction to Probability and Statistics
Analysis for Computer Scientists
Analytical Corporate Finance
Analyzing Qualitative Data with MAXQDA
An Anthology of London in Literature, 1558-1914
An Introduction to Biomechanics
An Introduction to Soil Mechanics
An Introduction to Zooarchaeology
Applied Bioinformatics
Applied Chemistry
Applied Linear Algebra
Applied Predictive Modeling
A Pythagorean Introduction to Number Theory
ArcGIS for Environmental and Water Issues
Argumentation Theory: A Pragma-Dialectical Perspective
Astronautics
Automatic Control with Experiments
Bayesian Essentials with R
Bioinformatics for Evolutionary Biologists
Breast Cancer
Brewing Science: A Multidisciplinary Approach
Brownian Motion, Martingales, and Stochastic Calculus
Building Energy Modeling with OpenStudio
Business Ethics - A Philosophical and Behavioral Approach
Calculus With Applications
Chemical and Bioprocess Engineering
Climate Change Science: A Modern Synthesis
Clinical Methods in Medical Family Therapy
Clinical Neuroanatomy
Communication and Bioethics at the End of Life
Complex Analysis
Concepts, Methods and Practical Applications in Applied Demography
Concise Guide to Databases
Conferencing and Presentation English for Young Academics
Control Engineering
Control Engineering: MATLAB Exercises
Criminal Justice and Mental Health
Customer Relationship Management
Data Science and Predictive Analytics
Digital Business Models
Digital Image Processing
Disability and Vocational Rehabilitation in Rural Settings
Educational Technology
Electronic Commerce 2018
Elementary Mechanics Using Matlab
Empathetic Space on Screen
Energy and the Wealth of Nations
Energy Harvesting and Energy Efficiency
Engineering Mechanics 2
Entertainment Science
ENZYMES: Catalysis, Kinetics and Mechanisms
Epidemiological Research: Terms and Concepts
Essentials of Business Analytics
Essentials of Food Science
Evidence-Based Interventions for Children with Challenging Behavior
Evidence-Based Practice in Clinical Social Work
Exam Survival Guide: Physical Chemistry
Excel Data Analysis
Food Chemistry
Food Fraud Prevention
Foundations of Behavioral Health
Foundations of Programming Languages
Fraud and Corruption
Fundamentals of Clinical Trials
Fundamentals of Java Programming
Fundamentals of Multimedia
Fundamentals of Solid State Engineering
Game Theory
Global Supply Chain and Operations Management
Group Theory
Group Theory Applied to Chemistry
Guide to Competitive Programming
Guide to Computer Network Security
Guide to Scientific Computing in C++
Handbook of Biological Confocal Microscopy
Handbook of Evolutionary Research in Archaeology
International Business Management
International Humanitarian Action
Internet of Things From Hype to Reality
Introduction to Artificial Intelligence
Introduction to Data Science
Introduction to Deep Learning
Introduction to Digital Systems Design
Introduction to Embedded Systems
Introduction to Formal Philosophy
Introduction to General Relativity
Introduction to Law
Introduction to Logic Circuits & Logic Design with VHDL
Introduction to Logic Circuits & Logic Design with VHDL
Introduction to Mathematica® for Physicists
Introduction to Parallel Computing
Introduction to Particle and Astroparticle Physics
Introduction to Programming with Fortran
Introduction to Statistics and Data Analysis
Introductory Computer Forensics
Introductory Quantum Mechanics
Intuitive Probability and Random Processes using MATLAB®
Java in Two Semesters
Knowledge Management
Lessons on Synthetic Bioarchitectures
Linear Algebra and Analytic Geometry for Physical Sciences
Logical Foundations of Cyber-Physical Systems
Logistics
Machine Learning in Medicine - a Complete Overview
Managing Media and Digital Organizations
Managing Sustainable Business
Mapping Global Theatre Histories
Market Research
Mathematical Logic
MATLAB for Psychologists
Media and Digital Management
Motivation and Action
Multimedia Big Data Computing for IoT Applications
Nanotechnology: Principles and Practices
Neural Networks and Deep Learning
New Introduction to Multiple Time Series Analysis
Object-Oriented Analysis, Design and Implementation
Of Cigarettes, High Heels, and Other Interesting Things
Off-Grid Electrical Systems in Developing Countries
Optimization of Process Flowsheets through Metaheuristic Techniques
Perceptual Organization
Perspectives on Elderly Crime and Victimization
Pharmaceutical Biotechnology
Pharmaceutical Biotechnology
Philosophical and Mathematical Logic
Philosophy of Race
Physical Asset Management
Physical Chemistry from a Different Angle
Physics from Symmetry
Physics of Oscillations and Waves
Plant Anatomy
Plant Ecology
Plant Physiology, Development and Metabolism
Policing and Minority Communities
Political Social Work
Polymer Chemistry
Polymer Synthesis: Theory and Practice
Practical Electrical Engineering
Principles of Quantum Mechanics
Probability and Statistics for Computer Science
Problems in Classical Electromagnetism
Proofs from THE BOOK
Psychoeducational Assessment and Report Writing
Python For ArcGIS
Python Programming Fundamentals
Quantitative Methods for the Social Sciences
Quantum Mechanics for Pedestrians 1
Quantum Mechanics for Pedestrians 2
Quick Start Guide to Verilog
Quick Start Guide to VHDL
Real Analysis
Recommender Systems
Research Methods for Social Justice and Equity in Education
Research Methods for the Digital Humanities
Scanning Electron Microscopy and X-Ray Microanalysis
School Leadership and Educational Change in Singapore
Social Justice Theory and Practice for Social Work
Social Marketing in Action
Social Psychology in Action
Spine Surgery
Stability and Control of Linear Systems
Statics and Mechanics of Structures
Strategic Human Resource Management and Employment Relations
Strategic Retail Management
Structural Dynamics
Sustainability Science
Systems Programming in Unix/Linux
Teaching Medicine and Medical Ethics Using Popular Culture
The ASCRS Textbook of Colon and Rectal Surgery
The A-Z of the PhD Trajectory
The Finite Element Method and Applications in Engineering Using ANSYS®
The Finite Volume Method in Computational Fluid Dynamics
The Physics of Semiconductors
The Psychology of Social Status
The Python Workbook
Travel Marketing, Tourism Economics and the Airline Product
Witnessing Torture

@pbowyer
Copy link

pbowyer commented Apr 29, 2020

I ran the script this morning (checkout commit: 40d528f) on Windows and got the full 15.2GB download, 736 Files, 21 Folders.

I installed it following the readme's venv instructions, then ran it like:

python main.py

@codingthat
Copy link
Author

@pbowyer Which version of python? I'm on 3.6.9 here.

@codingthat
Copy link
Author

(I wonder why it didn't get all of them, but didn't error out, either. If I start again will it redownload ones that are already complete?)

@pbowyer
Copy link

pbowyer commented Apr 29, 2020

@codingthat

Python 3.7.0 (default, Jun 28 2018, 08:04:48) [MSC v.1912 64 bit (AMD64)] :: Anaconda, Inc. on win32

@Artneo16
Copy link

Hi @pbowyer

Would zou be so kind to explain step by step how to do it? I have an error in CMD when i try to run the code.

Thanks in advance!!

BR

@pbowyer
Copy link

pbowyer commented Apr 30, 2020

Hi @Artneo16

Certainly, here's what I did:

  1. First, check you have a recent version of Python installed. To do this, type python at the command prompt. I'm using Python 3.7.0, and recommend you use that or a newer version. [Hint: to quit the python terminal you've launched, type quit()
  2. Now clone or download this repository
  3. Next I followed the instructions in this repository's README: https://github.com/alexgand/springer_free_books#virtual-environment-on-windows-python-3x. I ran the following commands (run them one at a time):
    python -m venv .venv
    .venv\Scripts\activate.bat
    pip install -r requirements.txt
    
  4. If there were no errors, you can now download the books:
    python main.py
    
    This will take some time.
  5. Finally, once the books have downloaded, deactivate the virtual environment:
    .venv\Scripts\deactivate.bat
    

Good luck!

@Artneo16
Copy link

Thank you very much!

@StreetGuru
Copy link

StreetGuru commented May 1, 2020

I ran the script this morning (checkout commit: 40d528f) on Windows and got the full 15.2GB download, 736 Files, 21 Folders.

So strange - I ran the script yesterday evening (on linux) and got 16.4GB, 757 Files, 21 Folders (edit: 407 ebooks were found on the script).

There are a few ebooks that seem to only be available in pdf format (I've checked Springer website and indeed they only have a pdf link for download).

@cgavir29
Copy link

cgavir29 commented May 1, 2020

I ran the script this morning (checkout commit: 40d528f) on Windows and got the full 15.2GB download, 736 Files, 21 Folders.

So strange - I ran the script yesterday evening (on linux) and got 16.4GB, 757 Files, 21 Folders (edit: 407 ebooks were found on the script).

There are a few ebooks that seem to only be available in pdf format (I've checked Springer website and indeed they only have a pdf link for download).

I ran it once and got 736 Files, 21 Folders just as @pbowyer. However, I ran it again and got 12 more pretty small .epub files <20k that didn't even open. Maybe something similar happened to you.

If you want to find the smalls ones you can do find . -name "*.epub" -size -20k in the root of the folder where you downloaded them. To delete all results use the flag -delete.

@StreetGuru
Copy link

I ran it once and got 736 Files, 21 Folders just as @pbowyer. However, I ran it again and got 12 more pretty small .epub files <20k that didn't even open. Maybe something similar happened to you.

If you want to find the smalls ones you can do find . -name "*.epub" -size -20k in the root of the folder where you downloaded them. To delete all results use the flag -delete.

Don't find any small epub files, seems to have downloaded all there is available. The strange thing is the fact that my download folder has a bigger size and more files than @pbowyer and I thought he had downloaded the entire collection?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants