Create Audio Feature in SDK #344

nickyfantasy · 2018-03-28T19:09:25Z

Add apis to record audio in SDK
Add corresponding apis in pybind, storage.py, sdk.h
Implement reservoir sampling when collecting audio samples

* Add apis to record audio in SDK * Add corresponding apis in pybind, storage.py, sdk.h * Implement reservoir sampling when collecting audio samples

jetfuel · 2018-03-28T19:16:23Z

@nickyfantasy Please run pre-commit to format the code. The Travis CI will check the format.

daming-lu · 2018-03-28T19:24:09Z

visualdl/logic/sdk.cc

+void Audio::SetSample(int index,
+                      int sample_rate,
+                      const std::vector<value_t>& data) {
+  CHECK_GT(sample_rate, 0) << "sample rate should be something like 6000, 8000 or 44100";


should be a positive number?

daming-lu · 2018-03-28T19:24:41Z

visualdl/logic/sdk.cc

+                      int sample_rate,
+                      const std::vector<value_t>& data) {
+  CHECK_GT(sample_rate, 0) << "sample rate should be something like 6000, 8000 or 44100";
+  CHECK_LT(index, num_samples_);


We can add error messages for these 2 as well.

daming-lu · 2018-03-28T19:41:22Z

visualdl/logic/sdk.h

+    struct AudioRecord {
+        int step_id;
+        int sample_rate;
+        std::vector<int> data;


So here the data is 'int' instead of 'float'?

yes, when we write data in, we convert float to string, when we read the data out, we convert binary to int

daming-lu · 2018-03-28T19:42:18Z

visualdl/logic/sdk.cc

+          << "g_log_dir should be set in LogReader construction";
+  BinaryRecordReader brcd(GenBinaryRecordDir(g_log_dir), filename);
+
+  std::transform(brcd.data.begin(),


is brcd.data the same as res.data?

brcd.data is the data in string format when we saved in file, when we read the data we convert to integer that becomes res.data

daming-lu · 2018-03-28T19:43:34Z

visualdl/logic/sdk.h

+
+    /*
+     * A combined interface for IsSampleTaken and SetSample, simpler but might be
+     * low effience.


daming-lu · 2018-03-28T19:44:22Z

visualdl/python/storage.py

@@ -119,6 +119,16 @@ def text(self, tag):
        check_tag_name_valid(tag)
        return self.reader.get_text(tag)

+    def audio(self, tag):
+        """
+        Get a audio reader with tag


Get 'an' audio

Superjomn · 2018-03-28T21:09:57Z

visualdl/logic/pybind.cc

-             auto tablet = self.tablet(tag);
-             return vs::components::ImageReader(self.mode(), tablet);
-           })
+      .def("get_image", [](vs::LogReader& self, const std::string& tag) {


these seem just some code style formatting, and we each other have a different clang-format config that results in some diff, maybe we need a unified version of config that makes we have the same code style?

In paddle, the clang-format 3.8 and google c++ style is used, different config and version may lead to some diff.

We can reference paddle's .clang-format configuration

yes, I just updated clang-format

Superjomn · 2018-03-28T21:14:41Z

visualdl/logic/sdk.cc

+  CHECK_LE(index, num_records_);
+
+    //convert float vector to char vector
+  std::vector<char> data_str(data.size());


it seems that data_str can directly be a string and no need to tranform from vector to string again.

std::string data_str(data.size()); ... BinaryRecord brcd(xxdir, std::move(data_str));

ok, I end up just use std::string(data.begin(),data.end()) to directly convert the data vector to string

Superjomn · 2018-03-28T21:22:31Z

visualdl/logic/sdk.h

+    struct AudioRecord {
+        int step_id;
+        int sample_rate;
+        std::vector<int> data;


To meet the audio value interval, is short or unsigned char or char is enough?

https://cn.mathworks.com/help/matlab/ref/audiorecorder.getaudiodata.html?s_tid=gn_loc_drop

here just int16, int8, uint8 are used, not int32.

Just a suggestion, not that important, int works good.

you are right, when we were doing speech recognition app before, we just use byte / int8

jetfuel · 2018-03-28T23:18:40Z

visualdl/logic/pybind.cc

@@ -219,6 +233,61 @@ PYBIND11_MODULE(core, m) {
      .def("total_records", &cp::TextReader::total_records)
      .def("size", &cp::TextReader::size);

+  py::class_<cp::Audio>(m, "AudioWriter", R"pbdoc(


Will it be weird to have documentations published on the website but not the code is not in the release pip? I am not sure what's the best approach here.

jetfuel · 2018-03-28T23:19:10Z

visualdl/logic/pybind.cc

+                  )pbdoc");
+
+  py::class_<cp::AudioReader::AudioRecord>(m, "AudioRecord")
+      // TODO(ChunweiYan) make these copyless.


Either remove the TODO or update it to yours

jetfuel · 2018-03-28T23:24:30Z

visualdl/logic/sdk.cc

+  num_records_ = 0;
+}
+
+int Audio::IsSampleTaken() {


Minor stuff, the function name is implying that the function will return a BOOL, but the function returns an index.
Maybe rename the function to NextRandSampleIndex or provide a comment here to explain the logic.

Create Audio Feature in SDK

81d54e6

* Add apis to record audio in SDK * Add corresponding apis in pybind, storage.py, sdk.h * Implement reservoir sampling when collecting audio samples

nickyfantasy requested review from Superjomn, jetfuel and daming-lu March 28, 2018 19:09

daming-lu reviewed Mar 28, 2018

View reviewed changes

Superjomn reviewed Mar 28, 2018

View reviewed changes

nickyfantasy added 2 commits March 28, 2018 15:42

fix clang format and update based on comment

0be2512

use int_8 for reading records and convert string directly from vector

d843de7

jetfuel reviewed Mar 28, 2018

View reviewed changes

refract isSampleTaken to IndexOfSampleTaken

5d1b040

daming-lu previously approved these changes Mar 29, 2018

View reviewed changes

fix clang format again

3de9514

nickyfantasy dismissed daming-lu’s stale review via 3de9514 March 29, 2018 00:25

daming-lu approved these changes Mar 29, 2018

View reviewed changes

nickyfantasy merged commit 37a3559 into PaddlePaddle:develop Mar 29, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create Audio Feature in SDK #344

Create Audio Feature in SDK #344

nickyfantasy commented Mar 28, 2018

jetfuel commented Mar 28, 2018

daming-lu Mar 28, 2018

nickyfantasy Mar 28, 2018

daming-lu Mar 28, 2018

nickyfantasy Mar 28, 2018

daming-lu Mar 28, 2018

nickyfantasy Mar 28, 2018

daming-lu Mar 28, 2018

nickyfantasy Mar 28, 2018

daming-lu Mar 28, 2018

daming-lu Mar 28, 2018

Superjomn Mar 28, 2018

nickyfantasy Mar 28, 2018

Superjomn Mar 28, 2018

nickyfantasy Mar 28, 2018

Superjomn Mar 28, 2018

nickyfantasy Mar 28, 2018

jetfuel Mar 28, 2018

jetfuel Mar 28, 2018

nickyfantasy Mar 29, 2018

jetfuel Mar 28, 2018

nickyfantasy Mar 28, 2018

Create Audio Feature in SDK #344

Create Audio Feature in SDK #344

Conversation

nickyfantasy commented Mar 28, 2018

jetfuel commented Mar 28, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment