Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About generateData() function #11

Closed
ghost opened this issue Apr 18, 2019 · 15 comments
Closed

About generateData() function #11

ghost opened this issue Apr 18, 2019 · 15 comments

Comments

@ghost
Copy link

ghost commented Apr 18, 2019

I just recently got this lib. over the browser. As written in the README file, it is my solution. Thanks, @alexmojaki.
So just I use the sample code of StreamTransferManager with a little modify in stream generator for bigger data.

Now I'm troubled to implement this lib. with my requirement. Yeah!! it would be great if you help me. My whole explanation is here: -

  1. I have an object of ByteArrayOutputStream os = new ByteArrayOutputStream();
  2. Suppose there are stream data in os.
  3. Now I'm just want to upload this stream data into S3 bucket with your lib.
    Please guide me.
    Thanks in advance.
@alexmojaki
Copy link
Owner

If you created the ByteArrayOutputStream yourself, then instead of creating it, create a StreamTransferManager with numStreams = 1 and then:

OutputStream os = manager.getMultiPartOutputStreams().get(0);

If for some reason you have to use a ByteArrayOutputStream, e.g. because it's created by some code you don't control, then I don't suggest using this library at all, because it seems you can't avoid keeping all the data in memory anyway.

@ghost
Copy link
Author

ghost commented Apr 18, 2019

Thanks, @alexmojaki for the quick reply.
Here is my sample code:

Lots of code... then ->
ByteArrayOutputStream os = new ByteArrayOutputStream();
com.aspose.slides.Presentation.save(os, SaveFormat.Pptx);
End of code..

Now here com.aspose.slides.Presentation. is class that save stream data in os
From here I want to upload this into s3 with this(your) lib without consuming memory and disk space.

@ghost
Copy link
Author

ghost commented Apr 18, 2019

@alexmojaki Any Luck!!!

@alexmojaki
Copy link
Owner

This has made me realise that requiring the user to call checkSize regularly is a problem, as the user may not always do the writing themselves. So I have released a new version 2.0.0 which does it automatically. Upgrade the version in your build system (pom.xml or build.gradle or whatever).

Then it's as I said, this code should work:

OutputStream os = manager.getMultiPartOutputStreams().get(0);
presentation.save(os, SaveFormat.Pptx);
os.close();

@alexmojaki
Copy link
Owner

It will definitely not write to disk at all. AWS should give you some metrics about RAM usage, try processing a big presentation and see what it says. If you don't change the default settings it should use very little memory and be able to process files up to 50GB. How big are these PowerPoint presentations? Are you sure you can't just hold the whole thing in memory and use the usual putObject method?

@ghost
Copy link
Author

ghost commented Apr 21, 2019

I just built only 30mb pptx file & got a matrix from AWS-Lambda, here:

594679 [main] INFO alex.mojaki.s3upload.StreamTransferManager - Initiated multipart upload to ec2-13-126-59-226.ap-south-1.compute.amazonaws.com/files/buildppt.pptx with full ID Q87ZCg8jNVG1_sLD8WITa7zxixZbC8slK3umuBara6vUzaoFr2cOO_hbMETSdgskr362dFSJbcfY8N2DaKfvOjEArPUTpsn_XSuROgsTewo-
597338 [main] INFO alex.mojaki.s3upload.MultiPartOutputStream - Called close() on [MultipartOutputStream for parts 1 - 10000]
597824 [pool-2-thread-1] INFO alex.mojaki.s3upload.StreamTransferManager - [Manager uploading to ec2-13-126-59-226.ap-south-1.compute.amazonaws.com/files/buildppt.pptx with id Q87ZCg8jN...ROgsTewo-]: Finished uploading [Part number 1 containing 25.59 MB]
597978 [pool-2-thread-1] INFO alex.mojaki.s3upload.StreamTransferManager - [Manager uploading to ec2-13-126-59-226.ap-south-1.compute.amazonaws.com/files/buildppt.pptx with id Q87ZCg8jN...ROgsTewo-]: Finished uploading [Part number 2 containing 5.00 MB]
598148 [main] INFO alex.mojaki.s3upload.StreamTransferManager - [Manager uploading to ec2-13-126-59-226.ap-south-1.compute.amazonaws.com/files/buildppt.pptx with id Q87ZCg8jN...ROgsTewo-]: Completed
END RequestId: 43d0850a-80c1-4adf-a367-1692069d7b6b
REPORT RequestId: 43d0850a-80c1-4adf-a367-1692069d7b6b	Duration: 27341.35 ms	Billed Duration: 27400 ms 	Memory Size: 3000 MB	Max Memory Used: 1020 MB	

It seems 1020mb are used only for building 30mb pptx file. And i'm not using putObject method anywhere. Here is code:

String[] keys = input.get("keys"); // array of file[presentation file] keys stored on s3 bucket.
Presentation finalPresentation = new Presentation();
ISlideCollection finalPresentationSlides = finalPresentation.getSlides();

for (int i = 0; i < keys.length; i++) {
 String key_name = keys[i];
 S3ObjectInputStream s3is = null;
 S3Object o = null;
 try {
	o = s3Client.getObject(bucket_name, key_name);
	s3is = o.getObjectContent(); // converting file[presentation file] into byte form.
 } catch (AmazonServiceException e) {
	System.err.println(e.getErrorMessage());
	System.exit(1);
 } 
 Presentation sourcePresentation = new Presentation(s3is);
 ISlide slide = sourcePresentation.getSlides().get_Item(1); // getting second slide from prese...
 finalPresentationSlides.addClone(slide); // attached into new presentation object.
 try {				
        s3is.close(); // release object from RAM on each iteration
	o.close(); // release object from RAM on each iteration
	sourcePresentation.dispose(); // release object from RAM on each iteration
  } catch (Exception e) {
	e.printStackTrace();
  }
} // end of loop.		
final StreamTransferManager manager = new StreamTransferManager(bucket_name, "buildppt1.pptx", s3Client); 
 try {		  
        MultiPartOutputStream os = manager.getMultiPartOutputStreams().get(0);
	finalPresentation.save(os, SaveFormat.Pptx);
	os.close(); 
 } catch (Exception e) { 
	e.printStackTrace();
 }
 manager.complete();		

@alexmojaki
Copy link
Owner

I think this is the same issue as #2. The library isn't using the memory, it's just that a lot gets automatically preallocated by Java. I don't know if there's a way to configure that in Lambda but I don't think you need to (I don't think it charges RAM usage).

Given that you have 3 GB available I still don't know why you anticipate running out unless you process some massive presentations. And even if you do, you somehow have to hold the Presentation object in memory, that's going to be more of a problem than the S3 upload.

@ghost ghost closed this as completed Apr 22, 2019
@ghost ghost reopened this Apr 23, 2019
@ghost
Copy link
Author

ghost commented Apr 23, 2019

Hi @alexmojaki Today, i was trying to build pptx file using 5 pptx file with size 298, 382, 170, 386, 276 MB but i'm getting an ERROR:

37 [main] INFO alex.mojaki.s3upload.StreamTransferManager - Initiated multipart upload to ec2-13-126-59-226.ap-south-1.compute.amazonaws.com/files/sample.pptx with full ID NN_TlARbl1f146FanD.Vj.trPrHK4tPV5zTmbUanEr0bSeXb4eiJM.WpShmHYMJlna0WM8CkSoaXnnR6BhMamcp3U5kb7.E.ZFNTdGOV27o-
31221 [main] INFO alex.mojaki.s3upload.MultiPartOutputStream - Called close() on [MultipartOutputStream for parts 1 - 10000]
Java heap space: java.lang.OutOfMemoryError
java.lang.OutOfMemoryError: Java heap space
	at com.aspose.slides.internal.e5.void.setCapacity(Unknown Source)
	at com.aspose.slides.internal.e5.void.b(Unknown Source)
	at com.aspose.slides.internal.e5.void.write(Unknown Source)
	at com.aspose.slides.internal.eu.case.write(Unknown Source)
	at com.aspose.slides.internal.eu.case.write(Unknown Source)
	at com.aspose.slides.internal.eu.boolean.write(Unknown Source)
	at com.aspose.slides.internal.eu.goto.write(Unknown Source)
	at com.aspose.slides.internal.eu.public.byte(Unknown Source)
	at com.aspose.slides.internal.eu.public.case(Unknown Source)
	at com.aspose.slides.internal.eu.public.int(Unknown Source)
	at com.aspose.slides.internal.eu.switch.public(Unknown Source)
	at com.aspose.slides.internal.eu.switch.if(Unknown Source)
	at com.aspose.slides.acy.do(Unknown Source)
	at com.aspose.slides.Presentation.do(Unknown Source)
	at com.aspose.slides.Presentation.do(Unknown Source)
	at com.aspose.slides.Presentation.do(Unknown Source)
	at com.aspose.slides.Presentation.save(Unknown Source)
	at com.testing.demo.LambdaFunctionHandler.handleRequest(LambdaFunctionHandler.java:74)
	at com.testing.demo.LambdaFunctionHandler.handleRequest(LambdaFunctionHandler.java:1)

END RequestId: f43af541-0916-46b9-bf97-f01ea79542a0
REPORT RequestId: f43af541-0916-46b9-bf97-f01ea79542a0	Duration: 86033.18 ms	Billed Duration: 86100 ms 	Memory Size: 3000 MB	Max Memory Used: 2975 MB	
Java heap space
java.lang.OutOfMemoryError ```

Could you help me, where i'm wrong?   

@alexmojaki
Copy link
Owner

Did you upgrade to version 2.0.0?

@ghost
Copy link
Author

ghost commented Apr 23, 2019

Yes.

@ghost
Copy link
Author

ghost commented Apr 23, 2019

Here:


        Presentation finalPresentation = new Presentation();
        ISlideCollection finalPresentationSlides = finalPresentation.getSlides();

        for (int i = 0; i < keys.length; i++) {
            String key_name = keys[i];
            System.out.println(key_name);
            S3ObjectInputStream s3is = null;
            S3Object o = null;
            try {
                o = s3.getObject(bucket_name, key_name);
                s3is = o.getObjectContent();
            } catch (AmazonServiceException e) {
                System.err.println(e.getErrorMessage());
                System.exit(1);
            }

            Presentation sourcePresentation = new Presentation(s3is);
            ISlide slide = sourcePresentation.getSlides().get_Item(0);

            finalPresentationSlides.addClone(slide);

            try {
                s3is.close();
                sourcePresentation.dispose();
                o.close();
            } catch (IOException e) {
                e.printStackTrace();
            } 

        }

        final StreamTransferManager manager = new StreamTransferManager(bucket_name, "buildppt.pptx", s3Client);
        MultiPartOutputStream os = null;
        
        try {
            os = manager.getMultiPartOutputStreams().get(0);
            finalPresentation.save(os, SaveFormat.Pptx);
            
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            os.close();
        }
        manager.complete();```

@alexmojaki
Copy link
Owner

It looks like Presentation.save doesn't incrementally write to the provided stream. Decompiling the class file with IntelliJ just shows stream.write(var3.toArray()). I put together some test code and indeed it just wrote one big array:

package alex.mojaki.s3upload.test;

import com.aspose.slides.ISlide;
import com.aspose.slides.ISlideCollection;
import com.aspose.slides.Presentation;
import com.aspose.slides.SaveFormat;

import java.io.ByteArrayOutputStream;
import java.io.FileInputStream;
import java.io.OutputStream;

public class SlidesTest {
    public static void main(String[] args) throws Exception {
        Presentation finalPresentation = new Presentation();
        ISlideCollection finalPresentationSlides = finalPresentation.getSlides();

        Presentation sourcePresentation = new Presentation(new FileInputStream("/Users/alexhall/Downloads/Presentation1.pptx"));
        ISlideCollection sourcePresentationSlides = sourcePresentation.getSlides();
        for (ISlide slide : sourcePresentationSlides) {
            for (int i = 0; i < 10; i++) {
                finalPresentationSlides.addClone(slide);
            }
        }
        System.out.println(finalPresentationSlides.size());
        sourcePresentation.dispose();

        OutputStream os = new MyStream();

        finalPresentation.save(os, SaveFormat.Pptx);
    }
}

class MyStream extends ByteArrayOutputStream {
    @Override
    public synchronized void write(byte[] b, int off, int len) {
        super.write(b, off, len);
        System.out.println(len);
    }
}

The output was 21 (the number of output slides) and 132278973, the number of bytes being written all at once. The source presentation contained a video, a GIF, and some text, across 2 slides.

So I'm sorry, but I don't see any way my library can help. You might have better luck writing to a file, but don't get too hopeful. You could also try contacting Aspose for help.

I'm going to close this since I don't think it's a problem with this library, but feel free to comment if you still need help.

@ghost
Copy link
Author

ghost commented Apr 24, 2019

Thanks, @alexmojaki.

@ghost
Copy link
Author

ghost commented Apr 25, 2019

This has made me realise that requiring the user to call checkSize regularly is a problem, as the user may not always do the writing themselves. So I have released a new version 2.0.0 which does it automatically. Upgrade the version in your build system (pom.xml or build.gradle or whatever).

Then it's as I said, this code should work:

OutputStream os = manager.getMultiPartOutputStreams().get(0);
presentation.save(os, SaveFormat.Pptx);
os.close();

@alexmojaki it is not working. throwing an ERROR:

5115 [main] INFO alex.mojaki.s3upload.StreamTransferManager - Initiated multipart upload to ec2-13-126-59-226.ap-south-1.compute.amazonaws.com-capzoneimage/files/prdxn1212121.pptx with full ID Bhpb.e1aCXhJxXk7zvDora5vPLCp.30VFxDVluY4TieIZNdsZC51S2VwXWE8HEOkogYDFeuA8EH.J5.cNw4wdaE.R7tB9jrwgI9pCCoU4W4-
6292 [main] INFO alex.mojaki.s3upload.MultiPartOutputStream - Called close() on [MultipartOutputStream for parts 1 - 10000]
java.lang.NullPointerException
	at alex.mojaki.s3upload.MultiPartOutputStream.write(MultiPartOutputStream.java:142)
	at alex.mojaki.s3upload.MultiPartOutputStream.write(MultiPartOutputStream.java:148)
	at com.aspose.slides.Presentation.save(Unknown Source)
	at com.amazonaws.lambda.demo.LambdaFunctionHandler.handleRequest(LambdaFunctionHandler.java:53)
	at com.amazonaws.lambda.demo.LambdaFunctionHandler.handleRequest(LambdaFunctionHandler.java:1)
	at lambdainternal.EventHandlerLoader$PojoHandlerAsStreamHandler.handleRequest(EventHandlerLoader.java:178)
	at lambdainternal.EventHandlerLoader$2.call(EventHandlerLoader.java:888)
	at lambdainternal.AWSLambda.startRuntime(AWSLambda.java:293)
	at lambdainternal.AWSLambda.<clinit>(AWSLambda.java:64)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:348)
	at lambdainternal.LambdaRTEntry.main(LambdaRTEntry.java:114)
8848 [main] INFO alex.mojaki.s3upload.StreamTransferManager - [Part number 1 containing 1.16 MB]: Uploading leftover stream null
8949 [main] INFO alex.mojaki.s3upload.StreamTransferManager - [Manager uploading to ec2-13-126-59-226.ap-south-1.compute.amazonaws.com-capzoneimage/files/prdxn1212121.pptx with id Bhpb.e1aC...pCCoU4W4-]: Finished uploading [Part number 1 containing 1.16 MB]
9166 [main] INFO alex.mojaki.s3upload.StreamTransferManager - [Manager uploading to ec2-13-126-59-226.ap-south-1.compute.amazonaws.com-capzoneimage/files/prdxn1212121.pptx with id Bhpb.e1aC...pCCoU4W4-]: Completed
END RequestId: c17bcf84-30a4-44de-b022-c83a8838e858```

@alexmojaki
Copy link
Owner

That indicates that something tried to write to the stream after closing it. I don't know how that happened, but your code at the end has some other problems that need fixing, and that might help. Use this code instead:

os = manager.getMultiPartOutputStreams().get(0);
try {
    finalPresentation.save(os, SaveFormat.Pptx);
    os.close();
    manager.complete();
} catch (Throwable e) {
    manager.abort();
    throw new RuntimeException(e);  // or e.printStackTrace(); if you really want the code to continue
}

In particular you need to abort if there's an exception, and not complete (your code was always completing).

There's also no need to close if something goes wrong (i.e. in a finally). It shouldn't be a problem, but since you're having an error from a premature close, it's worth a try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant