Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rgw:add a s3 API of make torrent for a object #9589

Merged
merged 1 commit into from Jul 21, 2016
Merged

Conversation

zhouruisong
Copy link
Contributor

@zhouruisong zhouruisong commented Jun 8, 2016

when you put a object, a torrent data of the object will be produced that saved in omap.When you execute the command gettorrent of a object, the torrent data of the object will be return from omap. The torrent data also be removed when the object is removed.
Signed-off-by: zhouruisong 236131368@qq.com

@oritwas
Copy link
Member

oritwas commented Jun 9, 2016

@zhouruisong , can you open a tracker issue for this new feature and it to the commit message
(Fixes: )
why do you think you need a new pool?

@mattbenjamin
Copy link
Contributor

@zhouruisong I'd like to get feedback from various folks on the different aspects of this, such as the torrent pool concept (e.g., how that might relate to storage classes), and maybe whether this capability should have its own RGWOp.

@dang
Copy link
Contributor

dang commented Jun 10, 2016

@zhouruisong So, I'm not liking a separate pool to store torrent files. How about, instead, the torrent file is generated each time, and the computationally intensive portions of it (such as the SHA) are stored on the primary object as attributes? That way, when the object is removed it's torrent is removed, and no extra pools or related files are required.

@zhouruisong
Copy link
Contributor Author

@oritwas tracker.ceph.com/issues/16241

@zhouruisong
Copy link
Contributor Author

zhouruisong commented Jun 12, 2016

@oritwas I need a new torrent pool because I want to save torrent files Individually. it is acceptable to save pool that has existed. I will modify it in the future.

@zhouruisong
Copy link
Contributor Author

@dang It will cost some time to produce torrent file when you get a object. I test that it cost about two seconds to produce a torrent file. The torrent file's source objcet is about 1.5G. So, I save the torrent file into ceph. if it exists, we wiil get it quickly. I will remove the torrent pool and save torrent file in existed pool. thanks!

@zhouruisong
Copy link
Contributor Author

zhouruisong commented Jun 12, 2016

@dang Now the torrent file name and torrent pool name are saved in attributes, torrent file is saved in torrent pool.Do you mean save torrent file in omap?

@mattbenjamin
Copy link
Contributor

@zhouruisong I think @dang is suggesting a strategy to materialize checksums/hashes, but not the torrent data; if someone really wants this kind of space amplification, I don't see how to argue against it for such users, provided there is a durable perf benefit (but that's worth digging more at), but b) I don't think we should force the space amp on everyone

@zhouruisong
Copy link
Contributor Author

@mattbenjamin The torrent data is only 64k for a object whose capacity is 1.5G. A object whose capacity is 100G only occupy a few megabytes. So I thinks it is will not occupy too mush space for a torrent data of a object. I see your worry. I hope you give me some suggestion for the solution. thanks!

torrent.set_encoding(g_conf->torrent_encoding);
torrent.set_origin(g_conf->torrent_origin);
string name = obj.get_orig_obj();
torrent.set_info_name(name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think that using get_orig_obj() to generate the torrent object names will prevent us from supporting object versioning and object instances. if you overwrite the original object with new data, you'd still find and return the old torrent object for it

i'd lean towards using the omap for this storage so that it's tied to a specific object version/instance, and gets cleaned up automatically when the object is deleted

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I see what you said and I will modify it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need the object name of torrent data beacuse there is a field in torrent file format that is called name。 For example: There is a object named 1.MP4 and when you get its torrent data, a torrent file named 1.MP4.torrent will be produced.But in 1.MP4.torrent, there is a filed name to save the source file. I need get this name when put object and save it into torrent data when producing torrent data. I do not need get the name when get object. If I do not get the object name in geting object, where can I get it? thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_orig_obj() is fine for the torrent info name and filename of the .torrent. the problem was only in the object name used to store it in rados

@zhouruisong
Copy link
Contributor Author

@dang I have updated my code according to your suggestion, please help me check thanks!

@zhouruisong
Copy link
Contributor Author

@mattbenjamin I have updated my code according to your suggestion, please help me check thanks!

@zhouruisong
Copy link
Contributor Author

@cbodley I have updated my code according to your suggestion, please help me check thanks!

@zhouruisong
Copy link
Contributor Author

@liewegas I have updated my code according to your suggestion, please help me check thanks!

@mattbenjamin
Copy link
Contributor

@zhouruisong lgtm, looking for feedback from @dang

@zhouruisong
Copy link
Contributor Author

zhouruisong commented Jun 16, 2016

@dang I have modified my code according to your suggestion. Now, torrent data has been saved into omap,and it will be removed when the object is removed. please give me your advice, thanks!


return op_ret;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seed class belongs in a separate file, like rgw_torrent.cc/h

@zhouruisong
Copy link
Contributor Author

@dang I have updated my code again according to your advice,please help me check, thanks!

@zhouruisong
Copy link
Contributor Author

@cbodley I have updated my code again according to your advice,please help me check, thanks!

@dang
Copy link
Contributor

dang commented Jun 17, 2016

LGTM

@mattbenjamin mattbenjamin assigned dang and unassigned mattbenjamin Jun 29, 2016
@zhouruisong
Copy link
Contributor Author

@dang What is the progress of the feature? I have test it and not find any problem

When you execute the command gettorrent of a object, a torrent file will be produced and returned.
The torrent also will be save into a pool named default.rgw.torrent.
If the torrent of a object exists in default.rgw.torrent, it will be returned.

Signed-off-by: zhouruisong <236131368@qq.com>
@zhouruisong zhouruisong force-pushed the master branch 2 times, most recently from 6b534d2 to 53d2829 Compare July 6, 2016 09:21
@dang
Copy link
Contributor

dang commented Jul 6, 2016

@zhouruisong I'm waiting for unit tests to be added to the PR, as requested by @cbodley

@zhouruisong
Copy link
Contributor Author

zhouruisong commented Jul 7, 2016

@dang I want to know how to write test code? can you give me a example? Do I have to change some interface of the class seed from private to public?

@cbodley
Copy link
Contributor

cbodley commented Jul 7, 2016

@zhouruisong i see what you mean: all of the encoding logic is tightly coupled to your seed class. moving that stuff into a separate class (or maybe free functions?) would make it easier to write tests against. consider an interface like this, which is similar to Ceph's existing serialization framework:

// control characters
void bencode_dict(bufferlist& bl) { bl.append('d'); }
void bencode_list(bufferlist& bl) { bl.append('l'); }
void bencode_end(bufferlist& bl) { bl.append('e'); }

// single values
void bencode(int i, bufferlist& bl);
void bencode(const std::string& str, bufferlist& bl);

// dictionary elements
void bencode(const std::string& key, int value, bufferlist& bl);
void bencode(const std::string& key, const std::string& value, bufferlist& bl);

Then an example unit test could look like this:

TEST(Bencode, Dict)
{
  bufferlist bl;

  bencode_dict(bl);
    bencode("foo", 5, bl);
    bencode("bar", "baz", bl);
  bencode_end(bl);

  ASSERT_EQ("d3fooi5e3bar3baze", bl.c_str());
}

@cbodley
Copy link
Contributor

cbodley commented Jul 7, 2016

p.s. you can find a simple example test in src/test/test_pageset.cc. copy that to src/test/rgw/test_bencode.cc to get started

@zhouruisong
Copy link
Contributor Author

@cbodley Do I run make check to run my unit test? All test will be run, how to make it check my own unit test?

@zhouruisong
Copy link
Contributor Author

@cbodley I have update my code and add some unit test codes, but when I run make check, I do not konw my unit test have been excuted, please look at my unit test code, I do not know whether the codes in CMakeLists.txt and Makefile-client.am have been executed. Where did I ignore?

${CRYPTO_LIBS}
)
set_target_properties(ceph_test_rgw_bencode PROPERTIES COMPILE_FLAGS
${UNITTEST_CXX_FLAGS})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is all you should need here:

add_executable(unittest_rgw_bencode test_rgw_bencode.cc)
add_ceph_unittest(unittest_rgw_bencode ${CMAKE_RUNTIME_OUTPUT_DIRECTORY}/unittest_rgw_bencode)
target_link_libraries(unittest_rgw_bencode rgw_a)

the add_ceph_unittest() part is what adds it to make check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbodley I have updated my code. but the unittest_rgw_bencode can not generate when I run make check.
I modify add "#define ENABLE_CLIENT 1" in src/acconfig.h and run make check again. the unittest_rgw_bencode can not generate. I think there are something configrations that I have ignored. I want to know why unittest_rgw_bencode can not be produced? Is it wrong with my code? thanks!

@zhouruisong
Copy link
Contributor Author

@dang I have finished unit test code.

@@ -13,7 +13,8 @@ TEST(Bencode, Str)
decode.bencode("bar", bl);
decode.bencode("baz", bl);

ASSERT_EQ("3:foo3:bar3:baz", bl.c_str());
const char *p = "3:foo3:bar3:baz";
ASSERT_EQ(0, strcmp(p, bl.c_str()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, ASSERT_STREQ() is what you want for const char* comparison (see https://github.com/google/googletest/blob/master/googletest/docs/Primer.md)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbodley I have updated this code, thank you for your advice

@zhouruisong
Copy link
Contributor Author

@cbodley thank you for your advice and I will update my code.

@dang
Copy link
Contributor

dang commented Jul 15, 2016

This is looking good. @cbodley is out today, but hopefully he can do a final review Monday to make sure all his comments have been addressed.

After that, could you squash all the commits down into a single commit with a good commit message?

@dang
Copy link
Contributor

dang commented Jul 21, 2016

@zhouruisong I had to revert this. It was causing crashes on normal put. It appears that this was not the latest version we reviewed. Did something get messed up in the squash?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants