About generateData() function #11

ghost · 2019-04-18T10:40:52Z

I just recently got this lib. over the browser. As written in the README file, it is my solution. Thanks, @alexmojaki.
So just I use the sample code of StreamTransferManager with a little modify in stream generator for bigger data.

Now I'm troubled to implement this lib. with my requirement. Yeah!! it would be great if you help me. My whole explanation is here: -

I have an object of ByteArrayOutputStream os = new ByteArrayOutputStream();
Suppose there are stream data in os.
Now I'm just want to upload this stream data into S3 bucket with your lib.
Please guide me.
Thanks in advance.

alexmojaki · 2019-04-18T11:18:20Z

If you created the ByteArrayOutputStream yourself, then instead of creating it, create a StreamTransferManager with numStreams = 1 and then:

OutputStream os = manager.getMultiPartOutputStreams().get(0);

If for some reason you have to use a ByteArrayOutputStream, e.g. because it's created by some code you don't control, then I don't suggest using this library at all, because it seems you can't avoid keeping all the data in memory anyway.

ghost · 2019-04-18T11:28:52Z

Thanks, @alexmojaki for the quick reply.
Here is my sample code:

Lots of code... then ->
ByteArrayOutputStream os = new ByteArrayOutputStream();
com.aspose.slides.Presentation.save(os, SaveFormat.Pptx);
End of code..

Now here com.aspose.slides.Presentation. is class that save stream data in os
From here I want to upload this into s3 with this(your) lib without consuming memory and disk space.

ghost · 2019-04-18T11:42:30Z

@alexmojaki Any Luck!!!

alexmojaki · 2019-04-19T12:52:56Z

This has made me realise that requiring the user to call checkSize regularly is a problem, as the user may not always do the writing themselves. So I have released a new version 2.0.0 which does it automatically. Upgrade the version in your build system (pom.xml or build.gradle or whatever).

Then it's as I said, this code should work:

OutputStream os = manager.getMultiPartOutputStreams().get(0);
presentation.save(os, SaveFormat.Pptx);
os.close();

alexmojaki · 2019-04-20T13:47:53Z

It will definitely not write to disk at all. AWS should give you some metrics about RAM usage, try processing a big presentation and see what it says. If you don't change the default settings it should use very little memory and be able to process files up to 50GB. How big are these PowerPoint presentations? Are you sure you can't just hold the whole thing in memory and use the usual putObject method?

ghost · 2019-04-21T06:21:16Z

I just built only 30mb pptx file & got a matrix from AWS-Lambda, here:

594679 [main] INFO alex.mojaki.s3upload.StreamTransferManager - Initiated multipart upload to ec2-13-126-59-226.ap-south-1.compute.amazonaws.com/files/buildppt.pptx with full ID Q87ZCg8jNVG1_sLD8WITa7zxixZbC8slK3umuBara6vUzaoFr2cOO_hbMETSdgskr362dFSJbcfY8N2DaKfvOjEArPUTpsn_XSuROgsTewo-
597338 [main] INFO alex.mojaki.s3upload.MultiPartOutputStream - Called close() on [MultipartOutputStream for parts 1 - 10000]
597824 [pool-2-thread-1] INFO alex.mojaki.s3upload.StreamTransferManager - [Manager uploading to ec2-13-126-59-226.ap-south-1.compute.amazonaws.com/files/buildppt.pptx with id Q87ZCg8jN...ROgsTewo-]: Finished uploading [Part number 1 containing 25.59 MB]
597978 [pool-2-thread-1] INFO alex.mojaki.s3upload.StreamTransferManager - [Manager uploading to ec2-13-126-59-226.ap-south-1.compute.amazonaws.com/files/buildppt.pptx with id Q87ZCg8jN...ROgsTewo-]: Finished uploading [Part number 2 containing 5.00 MB]
598148 [main] INFO alex.mojaki.s3upload.StreamTransferManager - [Manager uploading to ec2-13-126-59-226.ap-south-1.compute.amazonaws.com/files/buildppt.pptx with id Q87ZCg8jN...ROgsTewo-]: Completed
END RequestId: 43d0850a-80c1-4adf-a367-1692069d7b6b
REPORT RequestId: 43d0850a-80c1-4adf-a367-1692069d7b6b	Duration: 27341.35 ms	Billed Duration: 27400 ms 	Memory Size: 3000 MB	Max Memory Used: 1020 MB

It seems 1020mb are used only for building 30mb pptx file. And i'm not using putObject method anywhere. Here is code:

String[] keys = input.get("keys"); // array of file[presentation file] keys stored on s3 bucket.
Presentation finalPresentation = new Presentation();
ISlideCollection finalPresentationSlides = finalPresentation.getSlides();

for (int i = 0; i < keys.length; i++) {
 String key_name = keys[i];
 S3ObjectInputStream s3is = null;
 S3Object o = null;
 try {
	o = s3Client.getObject(bucket_name, key_name);
	s3is = o.getObjectContent(); // converting file[presentation file] into byte form.
 } catch (AmazonServiceException e) {
	System.err.println(e.getErrorMessage());
	System.exit(1);
 } 
 Presentation sourcePresentation = new Presentation(s3is);
 ISlide slide = sourcePresentation.getSlides().get_Item(1); // getting second slide from prese...
 finalPresentationSlides.addClone(slide); // attached into new presentation object.
 try {				
        s3is.close(); // release object from RAM on each iteration
	o.close(); // release object from RAM on each iteration
	sourcePresentation.dispose(); // release object from RAM on each iteration
  } catch (Exception e) {
	e.printStackTrace();
  }
} // end of loop.		
final StreamTransferManager manager = new StreamTransferManager(bucket_name, "buildppt1.pptx", s3Client); 
 try {		  
        MultiPartOutputStream os = manager.getMultiPartOutputStreams().get(0);
	finalPresentation.save(os, SaveFormat.Pptx);
	os.close(); 
 } catch (Exception e) { 
	e.printStackTrace();
 }
 manager.complete();

alexmojaki · 2019-04-21T06:37:13Z

I think this is the same issue as #2. The library isn't using the memory, it's just that a lot gets automatically preallocated by Java. I don't know if there's a way to configure that in Lambda but I don't think you need to (I don't think it charges RAM usage).

Given that you have 3 GB available I still don't know why you anticipate running out unless you process some massive presentations. And even if you do, you somehow have to hold the Presentation object in memory, that's going to be more of a problem than the S3 upload.

ghost · 2019-04-23T11:18:20Z

Hi @alexmojaki Today, i was trying to build pptx file using 5 pptx file with size 298, 382, 170, 386, 276 MB but i'm getting an ERROR:

37 [main] INFO alex.mojaki.s3upload.StreamTransferManager - Initiated multipart upload to ec2-13-126-59-226.ap-south-1.compute.amazonaws.com/files/sample.pptx with full ID NN_TlARbl1f146FanD.Vj.trPrHK4tPV5zTmbUanEr0bSeXb4eiJM.WpShmHYMJlna0WM8CkSoaXnnR6BhMamcp3U5kb7.E.ZFNTdGOV27o-
31221 [main] INFO alex.mojaki.s3upload.MultiPartOutputStream - Called close() on [MultipartOutputStream for parts 1 - 10000]
Java heap space: java.lang.OutOfMemoryError
java.lang.OutOfMemoryError: Java heap space
	at com.aspose.slides.internal.e5.void.setCapacity(Unknown Source)
	at com.aspose.slides.internal.e5.void.b(Unknown Source)
	at com.aspose.slides.internal.e5.void.write(Unknown Source)
	at com.aspose.slides.internal.eu.case.write(Unknown Source)
	at com.aspose.slides.internal.eu.case.write(Unknown Source)
	at com.aspose.slides.internal.eu.boolean.write(Unknown Source)
	at com.aspose.slides.internal.eu.goto.write(Unknown Source)
	at com.aspose.slides.internal.eu.public.byte(Unknown Source)
	at com.aspose.slides.internal.eu.public.case(Unknown Source)
	at com.aspose.slides.internal.eu.public.int(Unknown Source)
	at com.aspose.slides.internal.eu.switch.public(Unknown Source)
	at com.aspose.slides.internal.eu.switch.if(Unknown Source)
	at com.aspose.slides.acy.do(Unknown Source)
	at com.aspose.slides.Presentation.do(Unknown Source)
	at com.aspose.slides.Presentation.do(Unknown Source)
	at com.aspose.slides.Presentation.do(Unknown Source)
	at com.aspose.slides.Presentation.save(Unknown Source)
	at com.testing.demo.LambdaFunctionHandler.handleRequest(LambdaFunctionHandler.java:74)
	at com.testing.demo.LambdaFunctionHandler.handleRequest(LambdaFunctionHandler.java:1)

END RequestId: f43af541-0916-46b9-bf97-f01ea79542a0
REPORT RequestId: f43af541-0916-46b9-bf97-f01ea79542a0	Duration: 86033.18 ms	Billed Duration: 86100 ms 	Memory Size: 3000 MB	Max Memory Used: 2975 MB	
Java heap space
java.lang.OutOfMemoryError ```

Could you help me, where i'm wrong?

alexmojaki · 2019-04-23T12:07:18Z

Did you upgrade to version 2.0.0?

ghost · 2019-04-23T12:22:27Z

Yes.

ghost · 2019-04-23T12:24:32Z

Here:


        Presentation finalPresentation = new Presentation();
        ISlideCollection finalPresentationSlides = finalPresentation.getSlides();

        for (int i = 0; i < keys.length; i++) {
            String key_name = keys[i];
            System.out.println(key_name);
            S3ObjectInputStream s3is = null;
            S3Object o = null;
            try {
                o = s3.getObject(bucket_name, key_name);
                s3is = o.getObjectContent();
            } catch (AmazonServiceException e) {
                System.err.println(e.getErrorMessage());
                System.exit(1);
            }

            Presentation sourcePresentation = new Presentation(s3is);
            ISlide slide = sourcePresentation.getSlides().get_Item(0);

            finalPresentationSlides.addClone(slide);

            try {
                s3is.close();
                sourcePresentation.dispose();
                o.close();
            } catch (IOException e) {
                e.printStackTrace();
            } 

        }

        final StreamTransferManager manager = new StreamTransferManager(bucket_name, "buildppt.pptx", s3Client);
        MultiPartOutputStream os = null;
        
        try {
            os = manager.getMultiPartOutputStreams().get(0);
            finalPresentation.save(os, SaveFormat.Pptx);
            
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            os.close();
        }
        manager.complete();```

alexmojaki · 2019-04-23T20:05:40Z

It looks like Presentation.save doesn't incrementally write to the provided stream. Decompiling the class file with IntelliJ just shows stream.write(var3.toArray()). I put together some test code and indeed it just wrote one big array:

package alex.mojaki.s3upload.test;

import com.aspose.slides.ISlide;
import com.aspose.slides.ISlideCollection;
import com.aspose.slides.Presentation;
import com.aspose.slides.SaveFormat;

import java.io.ByteArrayOutputStream;
import java.io.FileInputStream;
import java.io.OutputStream;

public class SlidesTest {
    public static void main(String[] args) throws Exception {
        Presentation finalPresentation = new Presentation();
        ISlideCollection finalPresentationSlides = finalPresentation.getSlides();

        Presentation sourcePresentation = new Presentation(new FileInputStream("/Users/alexhall/Downloads/Presentation1.pptx"));
        ISlideCollection sourcePresentationSlides = sourcePresentation.getSlides();
        for (ISlide slide : sourcePresentationSlides) {
            for (int i = 0; i < 10; i++) {
                finalPresentationSlides.addClone(slide);
            }
        }
        System.out.println(finalPresentationSlides.size());
        sourcePresentation.dispose();

        OutputStream os = new MyStream();

        finalPresentation.save(os, SaveFormat.Pptx);
    }
}

class MyStream extends ByteArrayOutputStream {
    @Override
    public synchronized void write(byte[] b, int off, int len) {
        super.write(b, off, len);
        System.out.println(len);
    }
}

The output was 21 (the number of output slides) and 132278973, the number of bytes being written all at once. The source presentation contained a video, a GIF, and some text, across 2 slides.

So I'm sorry, but I don't see any way my library can help. You might have better luck writing to a file, but don't get too hopeful. You could also try contacting Aspose for help.

I'm going to close this since I don't think it's a problem with this library, but feel free to comment if you still need help.

ghost · 2019-04-24T10:41:39Z

Thanks, @alexmojaki.

ghost · 2019-04-25T14:34:15Z

This has made me realise that requiring the user to call checkSize regularly is a problem, as the user may not always do the writing themselves. So I have released a new version 2.0.0 which does it automatically. Upgrade the version in your build system (pom.xml or build.gradle or whatever).

Then it's as I said, this code should work:
OutputStream os = manager.getMultiPartOutputStreams().get(0);
presentation.save(os, SaveFormat.Pptx);
os.close();

@alexmojaki it is not working. throwing an ERROR:

5115 [main] INFO alex.mojaki.s3upload.StreamTransferManager - Initiated multipart upload to ec2-13-126-59-226.ap-south-1.compute.amazonaws.com-capzoneimage/files/prdxn1212121.pptx with full ID Bhpb.e1aCXhJxXk7zvDora5vPLCp.30VFxDVluY4TieIZNdsZC51S2VwXWE8HEOkogYDFeuA8EH.J5.cNw4wdaE.R7tB9jrwgI9pCCoU4W4-
6292 [main] INFO alex.mojaki.s3upload.MultiPartOutputStream - Called close() on [MultipartOutputStream for parts 1 - 10000]
java.lang.NullPointerException
	at alex.mojaki.s3upload.MultiPartOutputStream.write(MultiPartOutputStream.java:142)
	at alex.mojaki.s3upload.MultiPartOutputStream.write(MultiPartOutputStream.java:148)
	at com.aspose.slides.Presentation.save(Unknown Source)
	at com.amazonaws.lambda.demo.LambdaFunctionHandler.handleRequest(LambdaFunctionHandler.java:53)
	at com.amazonaws.lambda.demo.LambdaFunctionHandler.handleRequest(LambdaFunctionHandler.java:1)
	at lambdainternal.EventHandlerLoader$PojoHandlerAsStreamHandler.handleRequest(EventHandlerLoader.java:178)
	at lambdainternal.EventHandlerLoader$2.call(EventHandlerLoader.java:888)
	at lambdainternal.AWSLambda.startRuntime(AWSLambda.java:293)
	at lambdainternal.AWSLambda.<clinit>(AWSLambda.java:64)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:348)
	at lambdainternal.LambdaRTEntry.main(LambdaRTEntry.java:114)
8848 [main] INFO alex.mojaki.s3upload.StreamTransferManager - [Part number 1 containing 1.16 MB]: Uploading leftover stream null
8949 [main] INFO alex.mojaki.s3upload.StreamTransferManager - [Manager uploading to ec2-13-126-59-226.ap-south-1.compute.amazonaws.com-capzoneimage/files/prdxn1212121.pptx with id Bhpb.e1aC...pCCoU4W4-]: Finished uploading [Part number 1 containing 1.16 MB]
9166 [main] INFO alex.mojaki.s3upload.StreamTransferManager - [Manager uploading to ec2-13-126-59-226.ap-south-1.compute.amazonaws.com-capzoneimage/files/prdxn1212121.pptx with id Bhpb.e1aC...pCCoU4W4-]: Completed
END RequestId: c17bcf84-30a4-44de-b022-c83a8838e858```

alexmojaki · 2019-04-25T14:54:58Z

That indicates that something tried to write to the stream after closing it. I don't know how that happened, but your code at the end has some other problems that need fixing, and that might help. Use this code instead:

os = manager.getMultiPartOutputStreams().get(0);
try {
    finalPresentation.save(os, SaveFormat.Pptx);
    os.close();
    manager.complete();
} catch (Throwable e) {
    manager.abort();
    throw new RuntimeException(e);  // or e.printStackTrace(); if you really want the code to continue
}

In particular you need to abort if there's an exception, and not complete (your code was always completing).

There's also no need to close if something goes wrong (i.e. in a finally). It shouldn't be a problem, but since you're having an error from a premature close, it's worth a try.

ghost closed this as completed Apr 22, 2019

ghost reopened this Apr 23, 2019

alexmojaki closed this as completed Apr 23, 2019

alexmojaki mentioned this issue Dec 29, 2019

Set Content-disposition #22

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About generateData() function #11

About generateData() function #11

ghost commented Apr 18, 2019 •

edited by ghost

Loading

alexmojaki commented Apr 18, 2019

ghost commented Apr 18, 2019 •

edited by ghost

Loading

ghost commented Apr 18, 2019

alexmojaki commented Apr 19, 2019

alexmojaki commented Apr 20, 2019

ghost commented Apr 21, 2019 •

edited by alexmojaki

Loading

alexmojaki commented Apr 21, 2019

ghost commented Apr 23, 2019 •

edited by alexmojaki

Loading

alexmojaki commented Apr 23, 2019

ghost commented Apr 23, 2019

ghost commented Apr 23, 2019 •

edited by ghost

Loading

alexmojaki commented Apr 23, 2019

ghost commented Apr 24, 2019

ghost commented Apr 25, 2019 •

edited by ghost

Loading

alexmojaki commented Apr 25, 2019

About generateData() function #11

About generateData() function #11

Comments

ghost commented Apr 18, 2019 • edited by ghost Loading

alexmojaki commented Apr 18, 2019

ghost commented Apr 18, 2019 • edited by ghost Loading

ghost commented Apr 18, 2019

alexmojaki commented Apr 19, 2019

alexmojaki commented Apr 20, 2019

ghost commented Apr 21, 2019 • edited by alexmojaki Loading

alexmojaki commented Apr 21, 2019

ghost commented Apr 23, 2019 • edited by alexmojaki Loading

alexmojaki commented Apr 23, 2019

ghost commented Apr 23, 2019

ghost commented Apr 23, 2019 • edited by ghost Loading

alexmojaki commented Apr 23, 2019

ghost commented Apr 24, 2019

ghost commented Apr 25, 2019 • edited by ghost Loading

alexmojaki commented Apr 25, 2019

ghost commented Apr 18, 2019 •

edited by ghost

Loading

ghost commented Apr 18, 2019 •

edited by ghost

Loading

ghost commented Apr 21, 2019 •

edited by alexmojaki

Loading

ghost commented Apr 23, 2019 •

edited by alexmojaki

Loading

ghost commented Apr 23, 2019 •

edited by ghost

Loading

ghost commented Apr 25, 2019 •

edited by ghost

Loading