# Get started with SoilPulse

## Ingest data to soilpulse-core - step 1

Here we show how you can provide your data (and existing metadata) to create a soilpulsecore project.

In [1]:
# first we import all relevant soilpulsecore functionalities

from soilpulsecore.project_management import *
from soilpulsecore.resource_managers.filesystem import *
from soilpulsecore.resource_managers.mysql import *
from soilpulsecore.resource_managers.xml import *
from soilpulsecore.resource_managers.data_structures import *
from soilpulsecore.resource_managers.json import *
from soilpulsecore.data_publishers import *
from soilpulsecore.metadata_scheme import *
from soilpulsecore.db_access import EntityKeywordsDB, NullConnector

> Crawler type 'zero' registered.
Container type 'filesystem' registered
* Keywords database soilpulse\databases\keywords_filesystem registered as 'filesystem'
Container type 'file' registered
Container type 'directory' registered
Container type 'archive' registered
> Crawler type 'filesystem' registered.
> Crawler type 'csv' registered.
> Crawler type 'txt' registered.
Container type 'mysql' registered
* Keywords database soilpulse\databases\keywords_mysql registered as 'mysql'
Container type 'xml' registered
* Keywords database soilpulse\databases\keywords_xml registered as 'xml'
Container type 'table' registered
> Crawler type 'table' registered.
Container type 'column' registered
> Crawler type 'column' registered.
Container type 'json' registered
* Keywords database soilpulse\databases\keywords_json registered as 'json'
> Crawler type 'json' registered.
Publisher 'Zenodo' registered


In [2]:
# then we define some example DOI records that can be used
example_doi = {"name": "Soil erosion data of TUBAF rainsimlators in Lenz, 2022",
               "doi": "10.5281/zenodo.6654150"}
example_doi_url = {"name": "Rainfall simulation data Ries et al. 2019",
                   "doi": "10.6094/unifr/151460",
                   "url": "https://freidok.uni-freiburg.de/files/151460/twflMtwtvn01bDCC/Extreme_rainfall_experiment_data_06122019.zip"}
example_url = {"name": "Soil erosion data in Punjab, India, Lenz et. al",
               "url": "https://www.mdpi.com/2076-3263/8/11/396/s1"}
example_file_upload = {"name": "CTU soil erosion data example"}
example_reload_soilpulse_project = {"": ""}

### by DOI

In [3]:

# then we establish a new soilpulse core project from the given information
dbcon = NullConnector()
user_id = 1
project_doi = ProjectManager(dbcon, user_id, **example_doi)

project_doi.downloadPublishedFiles()


failed to load concept vocabulary 'AGROVOC' from 'vocabularies\agrovoc.json'
failed to load concept vocabulary 'TestConceptVocabulary' from 'vocabularies\_concepts_vocabulary_1.json'
failed to load method vocabulary 'TestMethodsVocabulary' from 'vocabularies\_methods_vocabulary_1.json'
loaded methods vocabularies: 
failed to load units vocabulary 'TestUnitsVocabulary' from 'vocabularies\_units_vocabulary_1.json'
loaded units vocabularies: 
doi: '10.5281/zenodo.6654150'

Obtaining metadata from DOI registration agency ...
 ... successful

File 'DOI_metadata.json' successfuly saved.
File 'Publisher_metadata.json' successfuly saved.
downloading remote files to SoilPulse storage ...
	10-toolboxvignette.Rmd - unsupported Crawler subclass special type 'Rmd' (registered types are: 'zero','filesystem','csv','txt','table','column','json') - 'zero crawler' will be used instead.
No content analysis procedure defined for crawler type 'zero'
	06-lookout.Rmd - unsupported Crawler subclass special ty

['C:\\Users\\JL\\SoilPulse\\project_files\\temp_1\\10-toolboxvignette.Rmd',
 'C:\\Users\\JL\\SoilPulse\\project_files\\temp_1\\06-lookout.Rmd',
 'C:\\Users\\JL\\SoilPulse\\project_files\\temp_1\\index.Rmd',
 'C:\\Users\\JL\\SoilPulse\\project_files\\temp_1\\lenz2022.zip',
 'C:\\Users\\JL\\SoilPulse\\project_files\\temp_1\\09-database.Rmd',
 'C:\\Users\\JL\\SoilPulse\\project_files\\temp_1\\11-varrain.Rmd',
 'C:\\Users\\JL\\SoilPulse\\project_files\\temp_1\\08-E3DIssues.Rmd',
 'C:\\Users\\JL\\SoilPulse\\project_files\\temp_1\\_output.yml',
 'C:\\Users\\JL\\SoilPulse\\project_files\\temp_1\\07-references.Rmd',
 'C:\\Users\\JL\\SoilPulse\\project_files\\temp_1\\preamble.tex',
 'C:\\Users\\JL\\SoilPulse\\project_files\\temp_1\\03a-code.Rmd']

### by url

In [4]:
dbcon = NullConnector()
user_id = 1
project_url = ProjectManager(dbcon, user_id, **example_url)

project_url.downloadPublishedFiles()

project_url.showContainerTree()
project_url.updateDBrecord()


failed to load concept vocabulary 'AGROVOC' from 'vocabularies\agrovoc.json'
failed to load concept vocabulary 'TestConceptVocabulary' from 'vocabularies\_concepts_vocabulary_1.json'
failed to load method vocabulary 'TestMethodsVocabulary' from 'vocabularies\_methods_vocabulary_1.json'
loaded methods vocabularies: 
failed to load units vocabulary 'TestUnitsVocabulary' from 'vocabularies\_units_vocabulary_1.json'
loaded units vocabularies: 
doi: 'None'
Empty DOI provided. DOI metadata were not retrieved.
The list of published files is empty.


Soil erosion data in Punjab, India, Lenz et. al
container tree:
--------------------------------------------------------------------------------



Saving project "Soil erosion data in Punjab, India, Lenz et. al" with ID 2 ... 
	(no containers to save)
	(no datasets to save)
	concepts vocabulary saved
	methods vocabulary saved
	units vocabulary saved
 ... successful.


## step 2 - analyze file system structure