Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

creating a cheatsheet for extracting & pushing clips onto zooniverse #181

Closed
alecristia opened this issue Apr 8, 2021 · 32 comments · Fixed by #182
Closed

creating a cheatsheet for extracting & pushing clips onto zooniverse #181

alecristia opened this issue Apr 8, 2021 · 32 comments · Fixed by #182
Labels
help wanted Extra attention is needed question Further information is requested

Comments

@alecristia
Copy link
Collaborator

Hi,
I'm trying to create a cheatsheet for myself for pushing extracting & clips onto zooniverse. I'll always do this on oberon, so I'll only think of that case.

So far I have:

datalad install git@github.com:LAAC-LSCP/solomon-data.git
cd solomon-data
source ~/ChildProjectVenv/bin/activate
datalad run-procedure setup

But that last step fails:

[INFO ] Running procedure setup
[INFO ] == Command start (output follows) =====
[INFO ] Could not enable annex remote cluster. This is expected if cluster is a pure Git remote, or happens if it is not accessible.
Traceback (most recent call last):
File "/scratch1/home/acristia/solomon-data/.datalad/procedures/setup.py", line 25, in
url = url
File "/scratch1/home/acristia/ChildProjectVenv/lib64/python3.6/site-packages/datalad/interface/utils.py", line 482, in eval_func
return return_func(generator_func)(args, **kwargs)
File "/scratch1/home/acristia/ChildProjectVenv/lib64/python3.6/site-packages/datalad/interface/utils.py", line 470, in return_func
results = list(results)
File "/scratch1/home/acristia/ChildProjectVenv/lib64/python3.6/site-packages/datalad/interface/utils.py", line 401, in generator_func
allkwargs):
File "/scratch1/home/acristia/ChildProjectVenv/lib64/python3.6/site-packages/datalad/interface/utils.py", line 557, in process_results
for res in results:
File "/scratch1/home/acristia/ChildProjectVenv/lib64/python3.6/site-packages/datalad/distribution/siblings.py", line 265, in call
**res_kwargs):
File "/scratch1/home/acristia/ChildProjectVenv/lib64/python3.6/site-packages/datalad/distribution/siblings.py", line 588, in configure_remote
ds.repo.set_preferred_content(prop, var, '.' if name =='here' else name)
File "/scratch1/home/acristia/ChildProjectVenv/lib64/python3.6/site-packages/datalad/support/annexrepo.py", line 2570, in set_preferred_content
return self.call_annex_oneline([property, remote or '.', expr])
File "/scratch1/home/acristia/ChildProjectVenv/lib64/python3.6/site-packages/datalad/support/annexrepo.py", line 1296, in call_annex_oneline
l for l in self.call_annex_items
(args, files=files)
File "/scratch1/home/acristia/ChildProjectVenv/lib64/python3.6/site-packages/datalad/support/annexrepo.py", line 1296, in
l for l in self.call_annex_items
(args, files=files)
File "/scratch1/home/acristia/ChildProjectVenv/lib64/python3.6/site-packages/datalad/support/annexrepo.py", line 1258, in call_annex_items_
protocol=StdOutErrCapture)['stdout']
File "/scratch1/home/acristia/ChildProjectVenv/lib64/python3.6/site-packages/datalad/support/annexrepo.py", line 987, in _call_annex
**kwargs)
File "/scratch1/home/acristia/ChildProjectVenv/lib64/python3.6/site-packages/datalad/cmd.py", line 412, in run
**results,
datalad.support.exceptions.CommandError: CommandError: 'git -c diff.ignoreSubmodules=none annex wanted cluster 'include=
' -c annex.dotfiles=true -c 'remote.origin.annex-ssh-options=-o ControlMaster=auto -S /scratch1/home/acristia/.cache/datalad/sockets/bd68b2c3' -c annex.retry=3 -c 'remote.cluster.annex-ssh-options=-o ControlMaster=auto -S /scratch1/home/acristia/.cache/datalad/sockets/0199f269'' failed with exitcode 1 under /scratch1/home/acristia/solomon-data [err: 'Unable to parse git config from cluster
ssh: Could not resolve hostname foberon: Name or service not known
ConnectionOpenFailedError: 'ssh -fN -o ControlMaster=auto -o ControlPersist=15m -o ControlPath=/scratch1/home/acristia/.cache/datalad/sockets/0199f269 foberon' failed with exitcode 255 [Failed to open SSH connection (could not start ControlMaster process)]
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
git-annex: cannot determine uuid for cluster (perhaps you need to run "git annex sync"?)']
[INFO ] == Command exit (modification check follows) =====
CommandError: '/scratch1/home/acristia/solomon-data/.datalad/procedures/setup.py /scratch1/home/acristia/solomon-data foberon' failed with exitcode 1 under /scratch1/home/acristia/solomon-data
(ChildProjectVenv) [acristia@oberon solomon-data]$ datalad run-procedure setup
[INFO ] Running procedure setup
[INFO ] == Command start (output follows) =====
.: cluster(+) [/scratch1/data/laac_data/solomon-data (git)]
[INFO ] Configure additional publication dependency on "cluster"
.: origin(-) [git@github.com:LAAC-LSCP/solomon-data.git (git)]
[INFO ] == Command exit (modification check follows) =====

Just in case, I pushed along and the next step worked:

datalad get recordings/converted

I'm uncertain as to the following step. I think I should select segments somehow -- but in the docs the next step is the chunkification of the segments.

About chunkification, I currently have this draft of the command:

cd .. #to find myself at the same level as solomon-data, since just now I was inside solomon-data
child-project zooniverse extract-chunks solomon-data --keyword talkerNtype --chunks-length 500 --segments segments.csv --destination solomon-data/annotations/zooniverse/raw --batch-size 1000

Should destination be inside solomon-data or somewhere else? What happens if I leave it unspecified? Same for chunks-length & batch-size. Could/should we have a default behavior that means that the chunks will be created inside solomon-data in some place such that the actual mp3/wavs don't get included in the data but the metadata etc does?

What is batch-size, actually? Why is it declared in the chunkification stage in addition to the upload stage? I just saw in the upload stage this is optional -- shouldn't it be mandatory (or have a default of 1000) in the upload stage?

The step after that is chunk upload. Here is my command draft:

child-project zooniverse upload-chunks solomon-data --chunks solomon-data/annotations/zooniverse/raw/chunks.csv
                                              --project-id 14957
                                              --set-prefix ac_20210408

Have we decided on a naming convention for the prefix?

The next step is to create a record that I did this, but updating the data. Here is my command draft for that:

cd solomon-data
datalad save annotations/zooniverse/raw -m "adding record of zoo chunks"

Eventually, I'll get classifications:

child-project zooniverse retrieve-classifications solomon-data --project-id 14957

And repeat the data update.

cd solomon-data
datalad save annotations/zooniverse/raw -m "adding record of zoo chunks - annotated"
@alecristia alecristia added help wanted Extra attention is needed question Further information is requested labels Apr 8, 2021
@lucasgautheron
Copy link
Collaborator

lucasgautheron commented Apr 8, 2021

Hi,
I'm trying to create a cheatsheet for myself for pushing extracting & clips onto zooniverse. I'll always do this on oberon, so I'll only think of that case.

So far I have:

datalad install git@github.com:LAAC-LSCP/solomon-data.git
cd solomon-data
source ~/ChildProjectVenv/bin/activate
datalad run-procedure setup

But that last step fails:

I think you may have accidentally run datalad run-procedure setup foberon (probably a copy paste from the doc) right before datalad run-procedure setup (which worked, as you can see, hence why the following steps are working)

Just in case, I pushed along and the next step worked:

datalad get recordings/converted

I'm uncertain as to the following step. I think I should select segments somehow -- but in the docs the next step is the chunkification of the segments.

It is true that you need to provide segments before, which the doc currently does not clearly state. But if you do not provide these segments, you won't be able to go any further.

About chunkification, I currently have this draft of the command:

cd .. #to find myself at the same level as solomon-data, since just now I was inside solomon-data
child-project zooniverse extract-chunks solomon-data --keyword talkerNtype --chunks-length 500 --segments segments.csv --destination solomon-data/annotations/zooniverse/raw --batch-size 1000

Should destination be inside solomon-data or somewhere else? What happens if I leave it unspecified? Same for chunks-length & batch-size. Could/should we have a default behavior that means that the chunks will be created inside solomon-data in some place such that the actual mp3/wavs don't get included in the data but the metadata etc does?

You need to define a destination (otherwise, an error will be thrown, and the script will stop)

It is up to the user to decide where to store the output. It might not be within solomon-data; e.g., if you are developing an analysis aside, you may have imported solomon-data as a subdataset, and the chunks will preferably lie somewhere in your analysis folder. We do not expect every user to push their own chunks to the original dataset in the general case. (Also, honestly, the audio chunks do not need to be kept once they have been uploaded.)

However, in case you want to push the chunks to the dataset, I am not sure annotations is the best fitting place for that. I think a better design would be to create a 'samples' subfolder, as in here: #148 (comment)

You could then set the destination to solomon-data/samples/high-volubility/chunks for instance

What is batch-size, actually? Why is it declared in the chunkification stage in addition to the upload stage? I just saw in the upload stage this is optional -- shouldn't it be mandatory (or have a default of 1000) in the upload stage?

Batch size defines how many of the chunks will be grouped and uploaded together. This reproduces the behavior of Chiara's script. This is apparently needed because of Zooniverse upload rate quotas. --batch-size defines how many chunks should each batch contain. At the upload step, you need to define how many of these batches will be uploaded. This way, you can upload n batches the first day, then n more batches the second day, etc. Maybe we could avoid that, and have only one option to specify how many chunks should be uploaded during the upload - in this case, we could drop the batch system. Let me rethink this!

The step after that is chunk upload. Here is my command draft:

child-project zooniverse upload-chunks solomon-data --chunks solomon-data/annotations/zooniverse/raw/chunks.csv
                                              --project-id 14957
                                              --set-prefix ac_20210408

Have we decided on a naming convention for the prefix?

We have not. Should we ?

The next step is to create a record that I did this, but updating the data. Here is my command draft for that:

cd solomon-data
datalad save annotations/zooniverse/raw -m "adding record of zoo chunks"

This is the equivalent of git add & git commit; you will also need to push the data at some point (datalad push)

Eventually, I'll get classifications:

child-project zooniverse retrieve-classifications solomon-data --project-id 14957

And repeat the data update.

cd solomon-data
datalad save annotations/zooniverse/raw -m "adding record of zoo chunks - annotated"

You will also have to set the destination for child-project zooniverse retrieve-classifications, e.g.:

child-project zooniverse retrieve-classifications solomon-data --destination solomon-data/samples/high-volubility/classifications_2021-04-10.csv --project-id XXX

PS: you don't need to cd out of solomon-data; you could just do child-project validate . for instance
PPS: if you write a cheatsheet for Zooniverse, can you please share it ? Then I can adapt it into a tutorial for the docs.

@lucasgautheron lucasgautheron linked a pull request Apr 8, 2021 that will close this issue
6 tasks
@lucasgautheron
Copy link
Collaborator

lucasgautheron commented Apr 8, 2021

I have found a workaround to avoid the batch system, which I implemented in #182 .

You can try it by installing the package from:

pip install git+https://github.com/LAAC-LSCP/ChildProject.git@zooniverse/improvements --upgrade

Below you can find the upgraded documentation:

$ child-project zooniverse extract-chunks --help
usage: child-project zooniverse extract-chunks [-h] --keyword KEYWORD
                                               [--chunks-length CHUNKS_LENGTH]
                                               [--chunks-min-amount CHUNKS_MIN_AMOUNT]
                                               --segments SEGMENTS
                                               --destination DESTINATION
                                               [--exclude-segments EXCLUDE_SEGMENTS [EXCLUDE_SEGMENTS ...]]
                                               [--threads THREADS]
                                               path

positional arguments:
  path                  path to the dataset

optional arguments:
  -h, --help            show this help message and exit
  --keyword KEYWORD     export keyword
  --chunks-length CHUNKS_LENGTH
                        chunk length (in milliseconds). if <= 0, the segments
                        will not be split into chunks
  --chunks-min-amount CHUNKS_MIN_AMOUNT
                        minimum amount of chunks to extract from a segment
  --segments SEGMENTS   path to the input segments dataframe
  --destination DESTINATION
                        destination
  --exclude-segments EXCLUDE_SEGMENTS [EXCLUDE_SEGMENTS ...]
                        segments to exclude before sampling
  --threads THREADS     how many threads to run on
$ child-project zooniverse upload-chunks --help
usage: child-project zooniverse upload-chunks [-h] --chunks CHUNKS
                                              --project-id PROJECT_ID
                                              --set-name SET_NAME
                                              [--amount AMOUNT]
                                              [--zooniverse-login ZOONIVERSE_LOGIN]
                                              [--zooniverse-pwd ZOONIVERSE_PWD]

optional arguments:
  -h, --help            show this help message and exit
  --chunks CHUNKS       path to the chunk CSV dataframe
  --project-id PROJECT_ID
                        zooniverse project id
  --set-name SET_NAME   subject set display name
  --amount AMOUNT       amount of chunks to upload
  --zooniverse-login ZOONIVERSE_LOGIN
                        zooniverse login. If not specified, the program
                        attempts to get it from the environment variable
                        ZOONIVERSE_LOGIN instead
  --zooniverse-pwd ZOONIVERSE_PWD
                        zooniverse password. If not specified, the program
                        attempts to get it from the environment variable
                        ZOONIVERSE_PWD instead

@lucasgautheron
Copy link
Collaborator

lucasgautheron commented Apr 13, 2021

I have just realised I had forgotten to answer about chunkification. if you do not specify a value for --chunk-length, currently, input segments will not be split (because the default value is zero). but we could change the default to a non-zero value (e.g. 500)

@alecristia
Copy link
Collaborator Author

alecristia commented Apr 21, 2021

THIS IS THE MOST UP TO DATE VERSION OF THE CHEAT SHEET -- NOT TESTED THE WHOLE THING & chunkify section needs a second check. Consider also replacing the scripts with commands in other sections

cheatsheet for zooniverse clip pushing

This is a cheatsheet for extracting & pushing clips onto zooniverse. It works on oberon; it does not work on my home computer (git-annex cannot be downloaded with my OS; not enough space for the audios).

I've adapted the zoo example python script and the zoo-phon-data script. I created two separate scripts: one for sampling, one for uploading.

preparation

I start by installing the dataset.

datalad install git@github.com:LAAC-LSCP/solomon-data.git
cd solomon-data
source ~/ChildProjectVenv/bin/activate
datalad run-procedure setup

Then I get the recordings & the VTC annotations, and validate.

datalad get recordings/converted
datalad get annotations/vtc/converted
child-project validate .

Both of those steps can be skipped if I already have the data.

Preparing the folder

I'm about to extract many files that can be re-generated if need be, and take up space + slow down indexing, so even before I generate them, I want to tell DataLad not to pay attention to them. This way, they won't get tracked or pushed. For more information on avoiding DataLad tracking look here). For our purposes, all we need to do is the following:

echo "samples/CHI_FEM/*" >> .gitignore  # add the folder that we will create in the next step to the list of folders to ignore
datalad save -m "ignore extracts folder" .gitignore 

sampling

Then I sample segments, chunkify, and upload.

For sampling, I'll do 250 random CHI vocs + 250 random FEM vocs. I decided to store the sound files in a folder called samples/CHI_FEM/, which I'll push. My adapted script, therefore, looks like this:

#!/usr/bin/env python3
from ChildProject.projects import ChildProject
from ChildProject.annotations import AnnotationManager
from ChildProject.pipelines.zooniverse import ZooniversePipeline
from ChildProject.pipelines.samplers import RandomVocalizationSampler

import argparse
import os
import pandas as pd

project = ChildProject('.')
project.read()

random_sampler = RandomVocalizationSampler(
    project,
    annotation_set = 'vtc',
    target_speaker_type = ['CHI'],
    sample_size = 250
)
random_sampler.sample()
os.makedirs('samples/CHI_FEM/random', exist_ok = True)
random_sampler.segments[['recording_filename', 'segment_onset', 'segment_offset']].to_csv('samples/CHI_FEM/random/samples.csv', index = False)

random_sampler = RandomVocalizationSampler(
    project,
    annotation_set = 'vtc',
    target_speaker_type = ['FEM'],
    sample_size = 250
)
random_sampler.sample()
random_sampler.segments[['recording_filename', 'segment_onset', 'segment_offset']].to_csv('samples/CHI_FEM/random/samples2.csv', index = False)

a = pd.read_csv('samples/CHI_FEM/random/samples.csv')
b = pd.read_csv('samples/CHI_FEM/random/samples2.csv')
c = pd.concat([a, b], join='outer')
c.to_csv("samples/CHI_FEM/random/samples.csv", index = False)

And I call it like this because all the paths are defined inside the code:

python scripts/sample_segments.py 

chunkify (not tested)

For chunkification, I'll do 500 ms length and only 2 threads as I'm in a smaller computer than the cluster. My script looks like this:

#!/usr/bin/env python3
from ChildProject.projects import ChildProject
from ChildProject.annotations import AnnotationManager
from ChildProject.pipelines.zooniverse import ZooniversePipeline

import argparse
import os
import pandas as pd

project = ChildProject('.')
project.read()


zooniverse = ZooniversePipeline()

chunks_path = zooniverse.extract_chunks(
    path = project.path,
    destination = 'samples/CHI_FEM/random/',
    keyword = 'ac_20210421a',
    segments = 'samples/CHI_FEM/random/samples.csv',
    chunks_length = 500,
    chunks_min_amount = 2,
    threads = 2,
    profile = 'standard'
)

This step takes a while, so to be on the safe side, I first do a screen, activate the environment, and call the script (like this because all the paths are defined inside the code):

screen
source ~/ChildProjectVenv/bin/activate
python scripts/chunkify_segments.py 

NOTE! one problem of doing the above is that I didn't overtly define a name for the chunks.csv file to be generated. So alternatively, next time, I could do instead:

screen
source ~/ChildProjectVenv/bin/activate
child-project zooniverse extract-chunks . --segments 'samples/CHI_FEM/random/samples.csv' --chunks-length 500 --chunks-min-amount 2 --threads 2 --profile 'standard' --keyword 'ac_20210421a' --destination  'samples/CHI_FEM/random/'

upload

For upload, I target our new project and don't batch them as it's no longer needed. I directly call the function:

child-project zooniverse upload-chunks --set-name ac_20210430 --chunks 'samples/CHI_FEM/random/chunks_20210430_112933.csv' --project-id 14957 --zooniverse-login acristia --zooniverse-pwd MYPASSWORD

record actions

The next step is to create a record that I did this, but updating the data. Here is my command draft for that:

datalad save -m "adding record of zoo chunks"
datalad push

get classifications

Eventually, I'll get classifications:

child-project zooniverse retrieve-classifications solomon-data --destination solomon-data/samples/CHI_FEM/random/classifications_2021-04-10.csv --project-id 14957

And repeat the data update.

datalad save -m "adding record of coded zoo chunks"
datalad push

@lucasgautheron
Copy link
Collaborator

That seems good (a few details: sample_size should be 500 instead of 250 according to your description, and the destination of zooniverse classifications should be something like samples/random instead of samples/high-volubility for consistency, but these are all details/probably typos).

However, there are a few issues:

  • the output of both samplers is written to the same file, so FEM segments will overwrite CHI segments.
  • in the extract_chunks() call, segments is set to 'segments.csv' but it does not exist (segments are written to samples/CHI_FEM/random/samples.csv instead)
  • I strongly suggest you to run this on the cluster. Then you don't have to upload hundreds of GBs of recordings, and it will be much faster because sampling and chunk extraction can be run on many cores in parallel.

@alecristia
Copy link
Collaborator Author

thanks for the proofing!

I see in the sampler docs that I can specify multiple talkers. If I changed my code to:

random_sampler = RandomVocalizationSampler(
    project,
    annotation_set = 'vtc',
    target_speaker_type = ['CHI','FEM'],
    sample_size = 500
)

will I get 250 of each, or no assurance on this?

@lucasgautheron
Copy link
Collaborator

Nope, it will sample uniformly among the union of CHI and FEM segments.

So you need to sample them separately if you want the same amount of each.

You can then concat the dataframes and save them into one dataframe if is more convenient to you however.

@alecristia
Copy link
Collaborator Author

roger! I fixed a couple of typos and I'm close, but:

$ python scripts/sample_segments.py

/Users/acristia/ChildProjectVenv/lib/python3.6/site-packages/pandas/core/frame.py:4174: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
errors=errors,
Traceback (most recent call last):
File "scripts/sample_segments.py", line 20, in
random_sampler.sample()
File "/Users/acristia/ChildProjectVenv/lib/python3.6/site-packages/ChildProject/pipelines/samplers.py", line 180, in sample
self.segments = self.segments.groupby('recording_filename').sample(self.sample_size)
AttributeError: 'NoneType' object has no attribute 'groupby'

@lucasgautheron
Copy link
Collaborator

Your script is working for me - at least the sampling part, I have not tested the zooniverse part.

A few suggestions:

  • Try upgrading your package
  • make sure VTC annotations are installed (e.g. do more annotations/vtc/converted/*)
  • random voc sampling can now be parallelised (with the threads argument, as for zooniverse)

@alecristia
Copy link
Collaborator Author

I tried from oberon, where the error does NOT replicate - but I get a new error. On oberon, upgraded package, checked VTC annotations (they are there, eg: annotations/vtc/converted/01_CW01_CH01_FB03_FB11_190622_0_0.csv), and tried again, and still get the same oberon-error (not the same error I got in home pc):

$ python scripts/sample_segments.py

/scratch1/home/acristia/ChildProjectVenv/lib64/python3.6/site-packages/pandas/core/frame.py:4174: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
errors=errors,
Traceback (most recent call last):
File "scripts/sample_segments.py", line 20, in
random_sampler.sample()
File "/scratch1/home/acristia/ChildProjectVenv/lib64/python3.6/site-packages/ChildProject/pipelines/samplers.py", line 180, in sample
self.segments = self.segments.groupby('recording_filename').sample(self.sample_size)
File "/scratch1/home/acristia/ChildProjectVenv/lib64/python3.6/site-packages/pandas/core/groupby/groupby.py", line 2865, in sample
for (, obj), w in zip(self, ws)
File "/scratch1/home/acristia/ChildProjectVenv/lib64/python3.6/site-packages/pandas/core/groupby/groupby.py", line 2865, in
for (
, obj), w in zip(self, ws)
File "/scratch1/home/acristia/ChildProjectVenv/lib64/python3.6/site-packages/pandas/core/generic.py", line 4993, in sample
locs = rs.choice(axis_length, size=n, replace=replace, p=weights)
File "mtrand.pyx", line 954, in numpy.random.mtrand.RandomState.choice
ValueError: Cannot take a larger sample than population when 'replace=False'

My naïve reading of the error is that there are fewer vocalizations than the ones I asked for, correct?

@lucasgautheron
Copy link
Collaborator

lucasgautheron commented Apr 21, 2021

You are right. However, this should not happen with the latest version of the package (I can see from the error that the code is outdated)

can you try upgrading again ?

pip3 install git+https://github.com/LAAC-LSCP/ChildProject.git --upgrade

@alecristia
Copy link
Collaborator Author

$ pip3 install git+https://github.com/LAAC-LSCP/ChildProject.git --upgrade

Collecting git+https://github.com/LAAC-LSCP/ChildProject.git
Cloning https://github.com/LAAC-LSCP/ChildProject.git to /tmp/pip-req-build-3ogs07x4
Running command git clone -q https://github.com/LAAC-LSCP/ChildProject.git /tmp/pip-req-build-3ogs07x4
Requirement already satisfied: pandas in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from ChildProject==0.0.1) (1.1.4)
Requirement already satisfied: xlrd in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from ChildProject==0.0.1) (1.2.0)
Requirement already satisfied: jinja2 in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from ChildProject==0.0.1) (2.11.2)
Requirement already satisfied: numpy>=1.16.5 in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from ChildProject==0.0.1) (1.19.4)
Requirement already satisfied: pympi-ling in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from ChildProject==0.0.1) (1.69)
Requirement already satisfied: lxml in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from ChildProject==0.0.1) (4.6.3)
Requirement already satisfied: sox in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from ChildProject==0.0.1) (1.4.1)
Requirement already satisfied: datalad in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from ChildProject==0.0.1) (0.14.1)
Requirement already satisfied: requests<2.25.0 in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from ChildProject==0.0.1) (2.24.0)
Requirement already satisfied: PyYAML in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from ChildProject==0.0.1) (5.4.1)
Requirement already satisfied: panoptes-client in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from ChildProject==0.0.1) (1.3.0)
Requirement already satisfied: pydub in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from ChildProject==0.0.1) (0.25.1)
Collecting importlib-resources
Downloading importlib_resources-5.1.2-py3-none-any.whl (25 kB)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from requests<2.25.0->ChildProject==0.0.1) (1.25.11)
Requirement already satisfied: idna<3,>=2.5 in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from requests<2.25.0->ChildProject==0.0.1) (2.10)
Requirement already satisfied: chardet<4,>=3.0.2 in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from requests<2.25.0->ChildProject==0.0.1) (3.0.4)
Requirement already satisfied: certifi>=2017.4.17 in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from requests<2.25.0->ChildProject==0.0.1) (2020.12.5)
Requirement already satisfied: boto in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from datalad->ChildProject==0.0.1) (2.49.0)
Requirement already satisfied: iso8601 in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from datalad->ChildProject==0.0.1) (0.1.14)
Requirement already satisfied: PyGithub in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from datalad->ChildProject==0.0.1) (1.54.1)
Requirement already satisfied: appdirs in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from datalad->ChildProject==0.0.1) (1.4.4)
Requirement already satisfied: whoosh in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from datalad->ChildProject==0.0.1) (2.7.4)
Requirement already satisfied: patool>=1.7 in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from datalad->ChildProject==0.0.1) (1.12)
Requirement already satisfied: humanize in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from datalad->ChildProject==0.0.1) (3.3.0)
Requirement already satisfied: annexremote in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from datalad->ChildProject==0.0.1) (1.5.0)
Requirement already satisfied: fasteners>=0.14 in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from datalad->ChildProject==0.0.1) (0.16)
Requirement already satisfied: keyring>=8.0 in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from datalad->ChildProject==0.0.1) (23.0.1)
Requirement already satisfied: keyrings.alt in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from datalad->ChildProject==0.0.1) (4.0.2)
Requirement already satisfied: msgpack in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from datalad->ChildProject==0.0.1) (1.0.2)
Requirement already satisfied: tqdm in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from datalad->ChildProject==0.0.1) (4.59.0)
Requirement already satisfied: jsmin in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from datalad->ChildProject==0.0.1) (2.2.2)
Requirement already satisfied: simplejson in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from datalad->ChildProject==0.0.1) (3.17.2)
Requirement already satisfied: wrapt in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from datalad->ChildProject==0.0.1) (1.12.1)
Requirement already satisfied: six in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from fasteners>=0.14->datalad->ChildProject==0.0.1) (1.15.0)
Requirement already satisfied: jeepney>=0.4.2 in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from keyring>=8.0->datalad->ChildProject==0.0.1) (0.6.0)
Requirement already satisfied: SecretStorage>=3.2 in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from keyring>=8.0->datalad->ChildProject==0.0.1) (3.3.1)
Requirement already satisfied: importlib-metadata>=3.6 in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from keyring>=8.0->datalad->ChildProject==0.0.1) (3.10.0)
Requirement already satisfied: typing-extensions>=3.6.4 in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from importlib-metadata>=3.6->keyring>=8.0->datalad->ChildProject==0.0.1) (3.7.4.3)
Requirement already satisfied: zipp>=0.5 in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from importlib-metadata>=3.6->keyring>=8.0->datalad->ChildProject==0.0.1) (3.4.1)
Requirement already satisfied: cryptography>=2.0 in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from SecretStorage>=3.2->keyring>=8.0->datalad->ChildProject==0.0.1) (3.4.7)
Requirement already satisfied: cffi>=1.12 in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from cryptography>=2.0->SecretStorage>=3.2->keyring>=8.0->datalad->ChildProject==0.0.1) (1.14.5)
Requirement already satisfied: pycparser in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from cffi>=1.12->cryptography>=2.0->SecretStorage>=3.2->keyring>=8.0->datalad->ChildProject==0.0.1) (2.20)
Requirement already satisfied: future in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from annexremote->datalad->ChildProject==0.0.1) (0.18.2)
Requirement already satisfied: setuptools in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from humanize->datalad->ChildProject==0.0.1) (40.6.2)
Requirement already satisfied: MarkupSafe>=0.23 in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from jinja2->ChildProject==0.0.1) (1.1.1)
Requirement already satisfied: python-dateutil>=2.7.3 in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from pandas->ChildProject==0.0.1) (2.8.1)
Requirement already satisfied: pytz>=2017.2 in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from pandas->ChildProject==0.0.1) (2020.4)
Requirement already satisfied: python-magic<0.5,>=0.4 in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from panoptes-client->ChildProject==0.0.1) (0.4.22)
Requirement already satisfied: redo>=1.7 in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from panoptes-client->ChildProject==0.0.1) (2.0.4)
Requirement already satisfied: pyjwt<2.0 in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from PyGithub->datalad->ChildProject==0.0.1) (1.7.1)
Requirement already satisfied: deprecated in /scratch1/home/acristia/ChildProjectVenv/lib/python3.6/site-packages (from PyGithub->datalad->ChildProject==0.0.1) (1.2.12)
Installing collected packages: importlib-resources
Successfully installed importlib-resources-5.1.2

(ChildProjectVenv) [acristia@oberon solomon-data]$ python scripts/sample_segments.py

/scratch1/home/acristia/ChildProjectVenv/lib64/python3.6/site-packages/pandas/core/frame.py:4174: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
errors=errors,
Traceback (most recent call last):
File "scripts/sample_segments.py", line 20, in
random_sampler.sample()
File "/scratch1/home/acristia/ChildProjectVenv/lib64/python3.6/site-packages/ChildProject/pipelines/samplers.py", line 180, in sample
self.segments = self.segments.groupby('recording_filename').sample(self.sample_size)
File "/scratch1/home/acristia/ChildProjectVenv/lib64/python3.6/site-packages/pandas/core/groupby/groupby.py", line 2865, in sample
for (, obj), w in zip(self, ws)
File "/scratch1/home/acristia/ChildProjectVenv/lib64/python3.6/site-packages/pandas/core/groupby/groupby.py", line 2865, in
for (
, obj), w in zip(self, ws)
File "/scratch1/home/acristia/ChildProjectVenv/lib64/python3.6/site-packages/pandas/core/generic.py", line 4993, in sample
locs = rs.choice(axis_length, size=n, replace=replace, p=weights)
File "mtrand.pyx", line 954, in numpy.random.mtrand.RandomState.choice
ValueError: Cannot take a larger sample than population when 'replace=False'

@alecristia
Copy link
Collaborator Author

neither of the following tried, even in a virtual environment:

pip3 install git+https://github.com/LAAC-LSCP/ChildProject.git --upgrade
pip install git+https://github.com/LAAC-LSCP/ChildProject.git --upgrade

however, uninstalling and reinstalling got rid of the error

pip uninstall ChildProject
pip install git+https://github.com/LAAC-LSCP/ChildProject.git --upgrade

Then the script runs.

@alecristia
Copy link
Collaborator Author

in the zooniverse section, I got

The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "scripts/zoo_segments.py", line 25, in
threads = 2
File "/scratch1/home/acristia/ChildProjectVenv/lib64/python3.6/site-packages/ChildProject/pipelines/zooniverse.py", line 176, in extract_chunks
self.chunks = pool.map(self.split_recording, segments)
File "/usr/lib64/python3.6/multiprocessing/pool.py", line 266, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/usr/lib64/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value
pydub.exceptions.CouldntDecodeError: Decoding failed. ffmpeg returned error code: 1
Output from ffmpeg/avlib:
ffmpeg version 2.8.15 Copyright (c) 2000-2018 the FFmpeg developers
built with gcc 4.8.5 (GCC) 20150623 (Red Hat 4.8.5-28)
configuration: --prefix=/usr --bindir=/usr/bin --datadir=/usr/share/ffmpeg --incdir=/usr/include/ffmpeg --libdir=/usr/lib64 --mandir=/usr/share/man --arch=x86_64 --optflags='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic' --extra-ldflags='-Wl,-z,relro ' --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libvo-amrwbenc --enable-version3 --enable-bzlib --disable-crystalhd --enable-gnutls --enable-ladspa --enable-libass --enable-libcdio --enable-libdc1394 --disable-indev=jack --enable-libfreetype --enable-libgsm --enable-libmp3lame --enable-openal --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-libschroedinger --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libvorbis --enable-libv4l2 --enable-libx264 --enable-libx265 --enable-libxvid --enable-x11grab --enable-avfilter --enable-avresample --enable-postproc --enable-pthreads --disable-static --enable-shared --enable-gpl --disable-debug --disable-stripping --shlibdir=/usr/lib64 --enable-runtime-cpudetect
libavutil 54. 31.100 / 54. 31.100
libavcodec 56. 60.100 / 56. 60.100
libavformat 56. 40.101 / 56. 40.101
libavdevice 56. 4.100 / 56. 4.100
libavfilter 5. 40.101 / 5. 40.101
libavresample 2. 1. 0 / 2. 1. 0
libswscale 3. 1.101 / 3. 1.101
libswresample 1. 2.101 / 1. 2.101
libpostproc 53. 3.100 / 53. 3.100
Guessed Channel Layout for Input Stream #0.0 : stereo
Input #0, wav, from './recordings/raw/01_CW02_CH02_LM03_LM40_190619.WAV':
Metadata:
encoder : Lavf56.40.101
Duration: 16:42:35.34, bitrate: 128 kb/s
Stream #0:0: Audio: adpcm_ima_wav ([17][0][0][0] / 0x0011), 16000 Hz, 2 channels, s16p, 128 kb/s
Unknown encoder 'pcm_s4le'

This was because I was using the raw recordings, rather than the converted recordings.

@alecristia
Copy link
Collaborator Author

I'm very close, but not quite done!
I'm at the step where I extract & upload segments to zooniverse, the upload phase in my cheatsheet.

In oberon, I'm doing:

source ~/ChildProjectVenv/bin/activate
nohup python scripts/zoo_segments.py &

And getting:

Traceback (most recent call last):
File "scripts/zoo_segments.py", line 3, in
from ChildProject.projects import ChildProject
ImportError: No module named ChildProject.projects
exported chunks metadata to samples/CHI_FEM/random/chunks_20210424_213335.csv
exported extract-chunks parameters to samples/CHI_FEM/random/parameters_20210424_213335.yml
Traceback (most recent call last):
File "scripts/zoo_segments.py", line 32, in
set_prefix = 'ac_20210421'
TypeError: upload_chunks() missing 1 required positional argument: 'set_name'
extracting chunks from ./recordings/converted/standard/01_CW02_CH02_LM03_LM40_190619.WAV...
samples/CHI_FEM/random/chunks/01_CW02_CH02_LM03_LM40_190619_30942616_30943116.wav already exists, exportation skipped.

Note that I added a set_name to my script (although the sample script didn't have this).

@alecristia
Copy link
Collaborator Author

Also, datalad save -m "adding record of upload script" is very slow -- probably because I didn't make the right decision regarding where to save the extracts.

@lucasgautheron
Copy link
Collaborator

Are you sure nohup is preserving the environment ?

I would suggest you to run the script in a screen instead. You can start a screen by doing screen, then do source ~/ChildProjectVenv/bin/activate and run the script

You can detach from the screen by doing Ctrl+d+a.

You can also do screen -ls to list all running screens, and screen -r [screen] to reattach one of them.

@lucasgautheron
Copy link
Collaborator

lucasgautheron commented Apr 27, 2021

Also, datalad save -m "adding record of upload script" is very slow -- probably because I didn't make the right decision regarding where to save the extracts.

Yes, I think they should not be saved. That's like 200.000 files in your case! Remember you can speed up most datalad operations by using the -J switch, specifying the amount of threads to run.

@alecristia
Copy link
Collaborator Author

Also, datalad save -m "adding record of upload script" is very slow -- probably because I didn't make the right decision regarding where to save the extracts.

Yes, I think they should not be saved. That's like 200.000 files in your case! Remember you can speed up most datalad operations by using the -J switch, specifying the amount of threads to run.

I'm sorry, I'm not sure I understand how to fix the situation and/or how to do this better next time. Let me lay out some possible lessons:

  • I should never provide as path samples/ within the folder. I don't think you're saying this, right?
  • If I do declare that as the place to put the extracts, then I need to stop these from being indexed. Going over the VanDam tutorial, I didn't find an example, since we only teach people how to avoid getting their files annexed. So I looked at the DataLad manual (specifically here), and I think the solution is sticking the path to my samples in a .gitignore

So if I had done things properly, I should have done this before actually creating the samples:

echo "samples/CHI_FEM/*" >> .gitignore
datalad save -m "ignore extracts folder" .gitignore

Sadly, that's not what I did, so now even doing datalad status is super slow because of the zillion files.

I can keep reading the manual, but if you already know a way in which I can fix my previous error, that would be really helpful!

@lucasgautheron
Copy link
Collaborator

lucasgautheron commented Apr 30, 2021

I think the best way is the one you described: you can leave your samples into the dataset, but make sure you add a .gitignore file beforehand.

Now, in order to recover a clean dataset, assuming the chunks were added in the last commit, you can do:

git reset HEAD~1
echo "samples/CHI_FEM/chunks/*" >> .gitignore
datalad save -m "ignore extracts folder" .gitignore
datalad save "samples/CHI_FEM/" -m "adding samples"

(Something like this should work)

For further clean up, you should remove the dangling chunks from the annex as well (see https://git-annex.branchable.com/walkthrough/unused_data/)

@alecristia
Copy link
Collaborator Author

great, and to check whether that's the case, I can do git log -n 1 and look at the name of my last commit

@alecristia
Copy link
Collaborator Author

alecristia commented Apr 30, 2021

it's the last mile! Last error is:

child-project zooniverse upload-chunks 'samples/CHI_FEM/random/' --set-name ac_20210430 --chunks 'samples/CHI_FEM/random/samples.csv' --project-id 14957

yields:

usage: child-project [-h]
{validate,import-annotations,merge-annotations,remove-annotations,rename-annotations,import-data,overview,compute-durations,convert,sampler,zooniverse,eaf-builder,anonymize}
...
child-project: error: unrecognized arguments: samples/CHI_FEM/random/

https://childproject.readthedocs.io/en/latest/zooniverse.html#chunk-upload

shows:

child-project zooniverse upload-chunks /path/to/dataset --help
usage: child-project zooniverse upload-chunks [-h] --chunks CHUNKS
--project-id PROJECT_ID
--set-name SET_NAME

I don't see my error, do you?

@alecristia alecristia reopened this May 3, 2021
@alecristia
Copy link
Collaborator Author

The error was that the command should have been:

child-project zooniverse upload-chunks --set-name ac_20210430 --chunks 'samples/CHI_FEM/random/chunks_20210430_112933.csv' --project-id 14957 --zooniverse-login acristia --zooniverse-pwd MYPASSWORD

That did create the subject up in zooniverse, but it didn't push clips, however. Here is the output:

uploading chunk 1_CW5_CH5_AJ09_AJ10_190710.WAV (23668064,23668564)
Traceback (most recent call last):
File "/scratch1/home/acristia/ChildProjectVenv/bin/child-project", line 11, in
load_entry_point('ChildProject==0.0.1', 'console_scripts', 'child-project')()
File "/scratch1/home/acristia/ChildProjectVenv/lib64/python3.6/site-packages/ChildProject/cmdline.py", line 311, in main
args.func(args)
File "/scratch1/home/acristia/ChildProjectVenv/lib64/python3.6/site-packages/ChildProject/cmdline.py", line 31, in
_parser.set_defaults(func = lambda args: cls().run(**vars(args)))
File "/scratch1/home/acristia/ChildProjectVenv/lib64/python3.6/site-packages/ChildProject/pipelines/zooniverse.py", line 371, in run
return self.upload_chunks(**kwargs)
File "/scratch1/home/acristia/ChildProjectVenv/lib64/python3.6/site-packages/ChildProject/pipelines/zooniverse.py", line 291, in upload_chunks
subject.save()
File "/scratch1/home/acristia/ChildProjectVenv/lib64/python3.6/site-packages/panoptes_client/subject.py", line 144, in save
log_args=False,
File "/scratch1/home/acristia/ChildProjectVenv/lib64/python3.6/site-packages/redo/init.py", line 170, in retry
return action(*args, **kwargs)
File "/scratch1/home/acristia/ChildProjectVenv/lib64/python3.6/site-packages/panoptes_client/panoptes.py", line 815, in save
etag=self.etag
File "/scratch1/home/acristia/ChildProjectVenv/lib64/python3.6/site-packages/panoptes_client/panoptes.py", line 404, in post
retry=retry,
File "/scratch1/home/acristia/ChildProjectVenv/lib64/python3.6/site-packages/panoptes_client/panoptes.py", line 281, in json_request
json_response['errors']
panoptes_client.panoptes.PanoptesAPIException: User has uploaded 12778 subjects of 10000 maximum

And a snapshot of the Zooniverse subject section:

image

It looks like the error is exceeding 10k quota.

So I tried again, this time specifying an amount:

child-project zooniverse upload-chunks --set-name ac_20210430 --chunks 'samples/CHI_FEM/random/chunks_20210430_112933.csv' --project-id 14957 --zooniverse-login acristia --zooniverse-pwd MYPASSWORD --amount 9999

Unfortunately, I get the same error:

uploading chunk 1_CW5_CH5_AJ09_AJ10_190710.WAV (23668064,23668564)
Traceback (most recent call last):
File "/scratch1/home/acristia/ChildProjectVenv/bin/child-project", line 11, in
load_entry_point('ChildProject==0.0.1', 'console_scripts', 'child-project')()
File "/scratch1/home/acristia/ChildProjectVenv/lib64/python3.6/site-packages/ChildProject/cmdline.py", line 311, in main
args.func(args)
File "/scratch1/home/acristia/ChildProjectVenv/lib64/python3.6/site-packages/ChildProject/cmdline.py", line 31, in
_parser.set_defaults(func = lambda args: cls().run(**vars(args)))
File "/scratch1/home/acristia/ChildProjectVenv/lib64/python3.6/site-packages/ChildProject/pipelines/zooniverse.py", line 371, in run
return self.upload_chunks(**kwargs)
File "/scratch1/home/acristia/ChildProjectVenv/lib64/python3.6/site-packages/ChildProject/pipelines/zooniverse.py", line 291, in upload_chunks
subject.save()
File "/scratch1/home/acristia/ChildProjectVenv/lib64/python3.6/site-packages/panoptes_client/subject.py", line 144, in save
log_args=False,
File "/scratch1/home/acristia/ChildProjectVenv/lib64/python3.6/site-packages/redo/init.py", line 170, in retry
return action(*args, **kwargs)
File "/scratch1/home/acristia/ChildProjectVenv/lib64/python3.6/site-packages/panoptes_client/panoptes.py", line 815, in save
etag=self.etag
File "/scratch1/home/acristia/ChildProjectVenv/lib64/python3.6/site-packages/panoptes_client/panoptes.py", line 404, in post
retry=retry,
File "/scratch1/home/acristia/ChildProjectVenv/lib64/python3.6/site-packages/panoptes_client/panoptes.py", line 281, in json_request
json_response['errors']
panoptes_client.panoptes.PanoptesAPIException: User has uploaded 12778 subjects of 10000 maximum

@lucasgautheron
Copy link
Collaborator

lucasgautheron commented May 3, 2021

My understanding is that the 10,000 quota is a limit for the whole project, and that you have to ask the administrators to have them increased.
EDIT: you can add me as an administrator of the project so that I can ask Zooniverse's staff to increase quotas on your behalf

@alecristia
Copy link
Collaborator Author

certainly, we'll ask - in fact, we also need to ask if we can bypass the beta phase (given that we already did it with our other project). But before we do that, I'd like to try out the interface with some sample data.

Is there a way in which I can push up just a few clips? I thought the "amount" flag did that, in child-project zooniverse upload-chunks --set-name ac_20210430 --chunks 'samples/CHI_FEM/random/chunks_20210430_112933.csv' --project-id 14957 --zooniverse-login acristia --zooniverse-pwd MYPASSWORD --amount 9999

@lucasgautheron
Copy link
Collaborator

lucasgautheron commented May 4, 2021

The --amount flag does exactly that - at least it should. But you've reached your project quota already, so even one clip (--amount 1) will be too much

@alecristia
Copy link
Collaborator Author

but there are no subjects -- so how can it think that we've gone over our quota?

Also, notice that in my screenshot, it says "The project has 0 uploaded subjects. You have uploaded 0 subjects from an allowance of 10000. Your uploaded subject count is the tally of all subjects (including those deleted) that your account has uploaded through the project builder or Zooniverse API. Please contact us to request changes to your allowance."

@lucasgautheron
Copy link
Collaborator

Weird! Could be because too many subjects were uploaded the first time and upload did not complete (because of the exception thrown by the API).
So, there might be a bunch of dangling subjects uploaded with no subject set.
(I don't know, I am really taking wild guesses.)

I'll try to see if there's a way to find invisible subjects like this
In any case, maybe try to have your quota increased and ask Zooniverse about this at the same time. Imo this should be considered as bug

I realised I have access to your project, so I can take care of it.
How urgent is this ?

@alecristia
Copy link
Collaborator Author

not urgent, but if we could get a couple of subjects in there, so I can test the project's interface, that would unblock me to ask them for permission etc.

@lucasgautheron
Copy link
Collaborator

Well, I just managed to get chunks through, on the same project and subject set. Can you give it another try ? Try low values for --amount (e.g. 1 to begin with)

@alecristia
Copy link
Collaborator Author

note that his affects my account specifically (not the project)
https://www.zooniverse.org/talk/18/2002495?comment=3266579&page=1

@alecristia
Copy link
Collaborator Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants