Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Audio Feature in SDK #344

Merged
merged 5 commits into from
Mar 29, 2018

Conversation

nickyfantasy
Copy link
Contributor

  • Add apis to record audio in SDK
  • Add corresponding apis in pybind, storage.py, sdk.h
  • Implement reservoir sampling when collecting audio samples

* Add apis to record audio in SDK
* Add corresponding apis in pybind, storage.py, sdk.h
* Implement reservoir sampling when collecting audio samples
@jetfuel
Copy link
Collaborator

jetfuel commented Mar 28, 2018

@nickyfantasy Please run pre-commit to format the code. The Travis CI will check the format.

void Audio::SetSample(int index,
int sample_rate,
const std::vector<value_t>& data) {
CHECK_GT(sample_rate, 0) << "sample rate should be something like 6000, 8000 or 44100";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be a positive number?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

int sample_rate,
const std::vector<value_t>& data) {
CHECK_GT(sample_rate, 0) << "sample rate should be something like 6000, 8000 or 44100";
CHECK_LT(index, num_samples_);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add error messages for these 2 as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

struct AudioRecord {
int step_id;
int sample_rate;
std::vector<int> data;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here the data is 'int' instead of 'float'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, when we write data in, we convert float to string, when we read the data out, we convert binary to int

<< "g_log_dir should be set in LogReader construction";
BinaryRecordReader brcd(GenBinaryRecordDir(g_log_dir), filename);

std::transform(brcd.data.begin(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is brcd.data the same as res.data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

brcd.data is the data in string format when we saved in file, when we read the data we convert to integer that becomes res.data


/*
* A combined interface for IsSampleTaken and SetSample, simpler but might be
* low effience.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

efficiency

@@ -119,6 +119,16 @@ def text(self, tag):
check_tag_name_valid(tag)
return self.reader.get_text(tag)

def audio(self, tag):
"""
Get a audio reader with tag
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get 'an' audio

auto tablet = self.tablet(tag);
return vs::components::ImageReader(self.mode(), tablet);
})
.def("get_image", [](vs::LogReader& self, const std::string& tag) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these seem just some code style formatting, and we each other have a different clang-format config that results in some diff, maybe we need a unified version of config that makes we have the same code style?

In paddle, the clang-format 3.8 and google c++ style is used, different config and version may lead to some diff.

We can reference paddle's .clang-format configuration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I just updated clang-format

CHECK_LE(index, num_records_);

//convert float vector to char vector
std::vector<char> data_str(data.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems that data_str can directly be a string and no need to tranform from vector to string again.

std::string data_str(data.size());
...
BinaryRecord brcd(xxdir, std::move(data_str));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I end up just use std::string(data.begin(),data.end()) to directly convert the data vector to string

struct AudioRecord {
int step_id;
int sample_rate;
std::vector<int> data;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To meet the audio value interval, is short or unsigned char or char is enough?

https://cn.mathworks.com/help/matlab/ref/audiorecorder.getaudiodata.html?s_tid=gn_loc_drop

here just int16, int8, uint8 are used, not int32.

Just a suggestion, not that important, int works good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, when we were doing speech recognition app before, we just use byte / int8

@@ -219,6 +233,61 @@ PYBIND11_MODULE(core, m) {
.def("total_records", &cp::TextReader::total_records)
.def("size", &cp::TextReader::size);

py::class_<cp::Audio>(m, "AudioWriter", R"pbdoc(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it be weird to have documentations published on the website but not the code is not in the release pip? I am not sure what's the best approach here.

)pbdoc");

py::class_<cp::AudioReader::AudioRecord>(m, "AudioRecord")
// TODO(ChunweiYan) make these copyless.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either remove the TODO or update it to yours

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya

num_records_ = 0;
}

int Audio::IsSampleTaken() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor stuff, the function name is implying that the function will return a BOOL, but the function returns an index.
Maybe rename the function to NextRandSampleIndex or provide a comment here to explain the logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true

daming-lu
daming-lu previously approved these changes Mar 29, 2018
@nickyfantasy nickyfantasy merged commit 37a3559 into PaddlePaddle:develop Mar 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants